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Gection  One 

INTRODUCTION 

This  dissertation  comprises  a  number  of  distinct  essays  linked  by 
a  common  theme.  The  common  theme  is  that  all  the  sections  of  the  dis¬ 
sertation  deal  with  one  aspect  or  another  of  the  theory  of  individual 
choice  behavior.  Section  Two  focuses  on  choices  involving  time;  Sec¬ 
tion  Three  focuses  on  how  information  affects  choices  involving  uncer¬ 
tainty.  The  final  section,  Section  Four,  reports  on  some  empirical 
studies  relating  to  the  theoretical  developments  of  the  preceding  two 
sections.  While  there  is  a  common  theme  to  the  dissertation  the  in¬ 
dividual  sections  reflect  a  considerable  diversity.  This  is  due  in 
l-rge  part  to  the  inherent  diversity  of  the  subject  matter.  Disci¬ 
plines  ranging  as  broadly  as  statistics,  psychology,  philosophy,  and 
economics  are  concerned  in  one  way  or  another  with  aspects  of  the 
theory  of  individual  choice  behavior.  The  studies  reported  h^ce  re¬ 
flect  the  diversity  of  these  disciplinary  viewpoints;  nevertheless, 
there  is  some  emphasis  on  relating  the  problems  considered  to  economic 
situations . 

I  would  like  to  begin  these  introductory  comments  by  providing  a 
classification  of  alternative  ways  of  looking  at  individual  choice 
behavior.  Many  such  classifications  are  possible;  i.uce  and  Suppes  [6J, 
for  example,  dicoiv.mize  theories  of  individual  choice  behavior  in  three 
separate  ways.  The  first  way  is  whether  or  not  the  theory  uses  alge¬ 
braic  or  probabilistic  tools.  The  second  wav  is  whether  or  net  the 
decisions  the  individuals  are  faced  with  involve  unccrt.iintv  or  not, 
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and  the  third  way  is  whether  or  not  the  theories  provide  a  complete 
ranking  of  all  the  alternatives  available  to  the  individual  or  merely 
specify  which  alternative  he  will  select  (or  the  probability  that  he 
will  select  each  alternative) .  With  three  two-way  splits  they  come  up 
with  a  possible  eight-way  classification  of  theories  —  though  a  number 
of  these  boxes  are  not  filled.  The  classification  that  T  would  pro¬ 
pose  is  somewhat  different.  Firs*-  T  would  distinguish  between  normative 
and  descriptive  theories ;  this  corresponds  in  a  rough  way  to  Luce  and 
Suppes'  distinction  between  algebraic  and  probabilistic  theories.  The 
second  distinction  I  would  make  is  again  concerned  with  certainty  ver¬ 
sus  uncertainty  though  I  would  make  thi»  a  three-way  distinction.  The 
first  would  be  decisions  under  certainty,  the  second  would  be  decisions 
under  uncertainty  with  no  opportunity  to  utilize  information  and  third 
are  decisions  under  uncertainty  that  do  involve  the  opportunitv  to  uti¬ 
lize  information.  The  final  distinction  that  I  would  make,  an'*  this  is 
of  particular  relevance  to  economists,  is  'hat  between  choices  involv¬ 
ing  time  and  those  that  Jo  not.  With  two  two-wav  classifications  and 
one  three-way  classification  l  thus  come  up  with  a  total  of  12  alter¬ 
native  boxes  into  which  theories  of  individual  choice  behavior  can  be 
put.  It  is  not  my  Intention  to  pursue  this  classification  in  detail 
but  merely  to  state  it  here  at  the  outset  to  place  things  in  some  per- 
spect 1 ve . 

I  would  like  now  to  indicate  in  a  very  brief  way  a  number  of  the 
areas  in  which  theories  now  exist  concerning  individual  choice  behav¬ 
ior.  By  far  the  best  developed  theory  within  economies  is  that  of 
individual  choice  behavior  under  certainty  when  the  basic  constraints 
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are  chose  determined  by  prices  and  Income.  The  keystone  of  this  theory 
is  the  theory  of  consumer  demand  first  developed  by  E.  Slutsky  and  J. 
Hicks.  Another  important  area  for  economics  is,  as  mentioned,  the 
theory  of  choice  involving  time.  The  work  of  Fisher  in  this  area  is 
generally  considered  seminal  and  is  discussed  further  in  Section  Two 
of  this  dissertation. 

a here  are  quite  a  number  of  alternative  theories  for  choice  under 
uncertainty  having  no  information  component.  Axioms  characterizing 
most  ot  these  theories — under  the  provision  that  uncertainty  be  in  some 
sense  "total"--are  succinctly  summarized  in  Milnor’s  [7]  well  known 
paper.  The  normative  theory  of  choice  under  uncertainty  involving  no 
information  component  that  is  now  increasingly  accepted,  and  the  one 
that  I  personally  accept,  was  first  sketched  by  Frank  Ramsey  [8]  and 
developed  with  axiomatie  care  by  L.  J.  Savage  [10],  It  is  proved  that 
if  individuals  act  in  accord  with  the  axioms  of  this  theory  they  act 
as  though  they  were  maximizing  the  expectation  of  a  utility  function 
against  a  unique  subjective  probability  distribution.  Von  Neumann 
and  Morgenstern  ( 1  ■  ]  provided  the  key  proof  of  the  existence  of  the 
utility  function,  though  under  the  assumption  that  the  probabilities 
of  tiie  events  were  exogenously  given. 

In  psychology,  as  one  would  expect,  the  emphasis  has  been  much 
more  on  descriptive  rather  than  normative  theories  though  there  is  of¬ 
ten  a  deliberate  tendency  to  undermine  tills  distinct  ion  bv  such  psy¬ 
chologists  as  Luce  and  Suppes  .  *.  good  dea1  of  psychology  has  dealt 

with  theories  of  information  usage,  that  is,  how  people  process  infor¬ 
mation  in  order  to  reduce  uncertainty  or  change  the  state  of  their 
beliefs.  It  Is  easy  to  discern  two  main  trends  In  fie  psvc'  olcei cal 
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literature  that  deals  with  this  in  a  somewhat  formal  way.  The  first 
of  these  trends  is  in  a  school  led  by  Ward  Edwards  at  the  University 
of  Michigan;  their  work  has  focused  on  studies  of  how  Bayes'  theorem 
is  used  by  subjects  in  actual  information  processing  tasks  to  update 
their  beliefs.  A  general  conclusion  is  that  subjects  move  in  the  di¬ 
rection  that  the  normative  theory  would  have  them  move  but  not  far 
enough — that  is,  they  act  as  degraded  Bayesian  information  processors. 

A  quite  ditf.  *ent  school  in  psychology  is  much  more  in  the  tradition 
of  the  stimulus  response  theories  first  developed  early  in  the  century. 
These  psychologists  view  learning  as  a  Markov  process,  generally,  though 
there  are  a  number  of  alternatives  and  extensions  to  this  way  of  looking 
at  learning.  Psychologists  now  working  in  this  field  base  much  of  their 
work  on  early  papers  by  W.  K.  Estes  (see,  for  example  [2])  and  the  book 
by  Bush  and  Mosielier  [1]. 

Another  tendency  in  psychology  has  been  to  attempt  to  formulate 
descriptive  (usually  probabilistic)  theories  of  choice  under  both  cer¬ 
tainty  and  uncertainty.  A  number  of  these  theories  were  first  put  for¬ 
ward  by  Luce  [4]  and  a  variety  of  theories  of  this  sort — including  some 
developed  by  economists — are  reviewed  in  detail  by  Luce  and  Suppe.  [6]. 

A  feature  of  most  of  these  theories  is  some  sort  of  attempt  to  deal  with 
observed  intransitivities  in  actual  choices.  One  way  of  handling  this 
is  to  assign  numbers  (usually  called  "response  strengths")  to  each  al¬ 
ternative;  the  probability  of  making  any  particular  choice  is,  then, 
proportional  to  its  response  strength.  Another  way  of'  handling  this 
problem  is  to  use  semiorders  rather  than  weak  orders  on  the  underlying 
preference  space;  Roberts  [9]  discusses  the  relations  between  these  two 
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There  is  one  further  class  of  studies  concerning  the  theory  of 
individual  choice  behavior  that  I  should  add  at  this  point.  It  does 
not  fit  into  one  of  the  twelve  boxes  that  I  described  previously  since 
it  is  much  more  concerned  with  the  methodology  of  this  type  of  study 
than  any  particular  study  itself.  These  studies  concerned  the  nature 
of  measurement  and  theory  construction  in  general.  An  important  re- 
viev  paper  concerning  the  theory  of  measurements  upon  which  many  of 
the  mathematically  oriented  psychological  studies  are  based  is  that 
of  Suppes  and  Zinnes  [11]  .  In  his  paper  entitled  "On  the  Possible 
Psychophysical  Laws,"  K.  D.  Luce  [3]  characterizes  the  class  of  func¬ 
tional  forms  that  are  meaningful  when  relating  scale  types  of  different 
strengths  to  one  another  through  empirical  laws.  In  a  later  paper 
(Luce  [5])  he  extends  this  initial  work. 

In  the  preceding  paragraphs  I  have  attempted  to  give  the  barest 
of  thumbnail  sketches  of  which  of  the  boxes  of  alternative  theories 
of  individual  choice  behavior  have  been  worked  on.  In  the  remainder 
of  this  introduction  I  will  give  an  overview  of  where  the  results  re¬ 
ported  in  this  dissertation  fit  into  that  schema. 

Section  Two  of  this  dissertation  deals  with  choices  involving 
time.  Empirical  work  concerning  how  people  do  in  fact  make  choices 
involving  time  has  been  the  province  of  both  psychologistis  and  econ¬ 
omists.  Economists  have  attempted  to  empirically  estimate  consumption 
functions  and  psychologists  have  attempted  to  look  at  a  number  of  fac¬ 
tors  that  influence  an  individual's  willingness  to  delay  gratification. 
In  Part  Two/One  there  is  a  relatively  brief  overview  of  some  of  the 
psychological  results.  In  Part  Two/Two  I  have  attempted  to  provide 
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a  theory  of  choice  involving  time  but  no  uncertainty.  The  theory  de¬ 
veloped  there  rests  on  the  observation  that  any  discounting  procedure 
acts  very  much  as  a  weighting  procedure  for  utilities  that  is  quite 
analogous  to  the  weighting  procedure  provided  by  subjective  probabil¬ 
ities.  Thus  an  axiomatic  system  such  as  Savage’s  [10]  provides  a  for¬ 
mal  basis  for  a  theory  of  choice  involving  time  but  no  uncertainty. 

In  Part  Two/Two,  then,  the  Savage  axioms  are  reinterpreted  in  a  tem¬ 
poral  context  and  the  meaning  of  the  theorems  for  choice  involving 
time  is  stated.  The  crucial  independence  assumption  that  is  required 
to  obtain  the  numerical  representation  is  discussed  and  it  is  pointed 
out  that  this  independence  axiom  is  much  less  plausible  for  the  inter¬ 
temporal  context  than  it  Is  in  the  uncertainty  context.  The  relation¬ 
ship  of  the  results  obtained  in  Part  Two/Two  are  then  discussed  in 
comparison  to  results  previously  obtained. 

In  Part  Two/Three  I  attempt  to  outline  an  axioma  Ic  framework  for 
choices  that  involve  both  time  and  uncertainty.  The  results  obtained 
there  are  rather  limited  and  of  two  sorts.  First,  I  look  at  choices 
involving  triples  of  the  following  form:  (a,  e,  t,).  Here  a  is  in¬ 
tended  to  be  a  prize  of  some  sort,  perhaps  an  amount  of  money,  e  is 
an  uncertain  event  upon  which  it  is  conditional,  and  t  is  the  time  at 
which  it  occurs.  An  example  of  such  a  triple  would  be  the  promise  to 
receive  one  thousand  dollars  in  1980  if  Nixon  is  not  reelected  in  1972. 
By  extending  some  work  of  Tversky  [12]  I  prove  that  choice  among  tri¬ 
ples  of  the  sort  'ust  described  can  be  shown  to  be  reflected  by  dis¬ 
counted  expected  utilities  under  rather  plausible  assumptions.  However, 
these  assumptions  are  not  sufficient  to  guarantee  that  the  probability 


weights  attached  to  the  random  events  form  a  probability  measure  over 
the  space  of  possible  events.  I  next  state  axioms  concerning  the  more 
general  inte^-temporal  choice  problem  under  uncertainty  from  which  I 
conjecture  that  both  a  discount  function  and  a  subjective  probability 
measure  can  be  derived. 

Section  Three  of  the  dissertation  deals  with  the  relationship 
between  information  and  choice.  Part  Three/One  is  an  essentially 
normative  study  and  Part  Three/Two  primarily  descriptive.  In  Part 
Three/One  what  I  attempt  to  do  is  show  how  a  thoroughly  subjectivistic 
concept  of  probability  can  be  used  to  encompass  the  inductive  logics 
developed  by  Carnap  and  Hintikka.  This  is  done  by  showing  that  the 
inductive  systems  proposed  by  them  can  be  shown  to  be  special  cases 
of  a  properly  formulated  subjectivistic  theory  of  induction  based  in 
a  straightforward  way  on  Bayes'  theorem. 

Part  Three/Two  deals  with  statistical  theories  of  learning  of  a 
thoroughly  descriptive  sort.  A  broad  range  of  theories  of  learning 
is  surveyed  and  many  of  the  theories  surveyed  are  considerably  gen¬ 
eralized.  The  most  important  generalization  is  to  allow  for  much 
richer  structures  to  be  placed  on  the  set  of  reinforcing  tents — 
thereby  bringing  the  theory  in  an  important  way  much  closer  to  prac¬ 
tical  reality.  Most  of  the  theories  of  learning  that  are  developed 
in  that  part  are  also  developed  there  for  the  situation  when  there  is 
a  continuum  of  response  alternatives.  This  case  is  of  particular  rel¬ 
evance  to  economics  as  most  price  and  quantity  decisions  are  of  just 
this  sort.  A  number  of  these  theories  could  be  tested  in  simulated 
economic  situations  by  analyzing  the  data  that  Professor  M.  Shubik 
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hopes  to  obtain  from  his  computer  based  economics  of  imperfect  com¬ 
petition  course  series.  The  closing  pages  of  Part  Three/Two  suggest 
a  general  framework,  within  which  problems  of  learning  and  inference 
can  be  discussed. 

Section  Four  of  the  dissertation  comprises  a  number  of  empirical 
studies  related  to  the  issues  brought  up  in  Section  Three.  Part  Four/ 
One  is  an  attempt  to  determine  the  actual  structure  of  a  subject's 
beliefs  under  circumstances  of  "total"  uncertainty.  Essentially,  a 
subject  is  asked  to  specify  his  prior  distribution  for  an  unknown 
probability  when  he  is  given  no  information  concerning  that  probabil¬ 
ity.  These  prior  distributions  are  obtained  for  a  number  of  different 
numbers  of  states  of  the  world.  Part  Four/Two  reports  on  an  experi¬ 
ment  performed  on  computer  terminals  at  Stanford  University  to  test 
theories  of  paired-associate  learning  that  attempt  to  describe  com¬ 
plicated  structure  placed  on  the  set  of  reinforcing  events.  The  task 
set  the  subjects  .as  sufficiently  simple  so  that  subjects  were  able 
to  approach  in  their  performance  what  would  be  predicted  by  a  rather 
complicated  normative  model;  curves  showing  the  actual  versus  normative 
pet for...ance  of  the  subject  are  presented  for  a  wide  variety  of  con¬ 
ditions.  In  Par  Four/Three  an  attempt  is  made  to  investigate  infor¬ 
mation  seeking  behavior  of  a  particularly  simple  sort  for  subjects. 

Even  in  the  very  simple  case  presented  there,  however,  a  normative 
model  of  optimal  decisions  concerning  whether  or  not  to  acquire  infor¬ 
mation  is  somewhat  difficult  to  obtain.  In  contrast  to  the  results 
of  Part  Four/Two,  it  turns  out  that  subjects'  behavior  is  not  partic¬ 
ularly  well  predicted  by  a  normative  model;  nevertheless,  there  is 
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some  increased  tendency  for  subjects  to  acquire  information  when  the 
value  of  doing  so  is  high. 

The  studies  reported  in  more  detail  in  what  follows  represent, 
then,  a  somewhat  heterogenous  collection  of  essays  concerning  one  as¬ 
pect  or  another  of  the  theory  of  individual  choice  behavior.  Studies 
reported  are  normative  and  descriptive,  empirical  and  theoretical,  and 
both  psychological  and  econ-.cic.  It  would  be  nice  to  report  that  un¬ 
derneath  this  heterogeneity  there  is  an  underlying  unity  aside  from 
that  of  general  subject  matter.  I  fear,  however,  that  there  is  no 
such  unity;  my  appi'oach  is  more  that  of  the  fox  than  the  hedgehog. 
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Section  Two 

CHOICES  INVOLVING  TIME 

If  we  classify  any  decision  an  individual  must  make  according  to, 
first,  whether  or  not  it  involves  time  and,  second,  whether  or  not  it 
involves  uncertainty,  each  decision  will  fall  within  one  of  four  pos¬ 
sible  categories: 

1.  Decisions  having  certain  outcomes  and  no  time  element, 

2.  Decisions  having  uncertain  outcomes  and  no  time  element, 

3.  Decisions  having  certain  outcomes  that  involve  time,  or 

4.  Decisions  having  uncertain  outcomes  that  involve  time. 

The  theory  of  consumer  demand  traditionally  deals  with  situation 
1.  The  four  or  five  postualtes  for  "rational"  behavior  under  these 
circumstances  imply  the  existence  of  a  utility  function  defined  on  the 
set  of  out  cones  (and  unique  up  to  an  increasing  monotonic  transforma¬ 
tion);  the  individual  chooses  as  though  he  were  maximizing  utility 
according  to  this  function,  subject  to  a  budget  constraint. 

The  opcimal  procedure  in  situation  2  is  presently  a  matter  of 
controversy.  It  is  the  author's  belief  that  the  axiom  system  of 
Savage  [28]  (perhaps  including  modifications  of  Luce  and  Krantz  ) 
gives  the  clearest  notion  of  rationality  for  decisions  under  uncer¬ 
tainty.  These  axioms  state  conditions  on  an  individual's  preferences 
which  imply  that  he  acts  as  though  he  were  maximizing  expected  utility 
against  a  unique  probability  distribution  over  the  states  of  nature. 

— 

R.  D.  Luce  and  D.  Krantz,  "Conditional  Expected  Utility,"  un¬ 
published  manuscript. 


“  ’a  »  \  .j*'  X  -  •  it .. 
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The  utility  function  that  is  shown  to  exist  is  unique  up  to  a  positive 
linear  transformation. 

This  Section  is  concerned  with  the  analysi  of  situations  3  and 
4.  There  appears  to  be  a  strong  formal  similarity  between  decisions 
under  uncertainty  that  have  no  temporal  element  and  decisions  that  do 
have  a  temporal  element  but  involve  no  uncertainty.  This  similarity 
is  used  to  analyze  situation  3;  the  intuitive  basis  for  the  similarity 
is  as  follows:  L.ilities  are  calibrated  in  s tronger-than-ordinal  terms 
by  use  of  probabilities  in  the  Savage  theory,  following  the  work  of 
Ramsey  [26]  and  von  Neamann  and  Morgenstern  [35].  Consider  three  out¬ 
comes,  a,  b,  and  c;  and  assume  that  a  is  preferred  to  b,  and  b  to  c. 

Now  assume  that  receiving  b  with  certainty  is  indifferent  to  receiving 
a  with  some  probability  p,  and  c  with  probability  1  -  p.  The  magnitude 
of  p  is,  then,  an  index  of  how  close  in  utility  b  is  to  a,  relative  to 

how  close  c  is  to  a.  This  observation  is  central  to  the  development 

of  cardinal  utility  theory. 

A  similar  intuitive  construction  can  be  made  for  decisions  In¬ 
volving  time,  but  not  uncertainty.  Let  a  be  preferred  to  b,  end  assume 
that  the  Individual  has  a  p,':t :  t.  i'.'c  n:t  •  of  rrv/t’rvnv,  l.e.,  he 

prefers  to  advance  the  consumption  of  relatively  desirable  commod l t ies . 
Though  the  individual  prefers  a  to  b,  it  is  reasonable  to  assume  that 
there  exists  a  time  t*  such  that  he  would  prefer  receiving  b  now  to 
receiving  a  at  a  time  further  than  t*  in  the  future.  What  the  minimum 

value  of  t*  is  will  depend  both  on  how  strongly  the  individual  prefers 

a  to  b  and  on  the  magnitude  of  his  rate  of  time  preference.  Like  know¬ 
ing  probabilities,  knowing  the  magnitude  of  the  individual's  rate  of 
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time  preference  would  enable  us  to  calibrate  cardinal  utilities.  The 

problem  is  to  separate  out  the  effect  on  choice  of  time  preference 
from  that  of  utility. 

In  Part  Two/Two  of  this  dissertation  the  arguments  outlined  in 
the  preceding  paragraph  are  treated  more  formally  to  provide  a  theory 
cf  decisions  involving  time  but  no  uncertainty.  Part  Two/Three  com¬ 
prises  an  initial  attempt  to  extend  this  analysis  in  a  wav  that  ac¬ 
counts  for  uncertainty. 

Before  turning  to  that  formal  analysis,  however,  I  summarize  a 
number  of  empirical  studies  reported  in  the  psychological  literature 
concerning  how  individuals  actually  do  make  choices  involving  time. 
These  studies  contain  minimal  theoretical  development  (at  least  of  a 
formal  sort)  and  thus  contrast  with  the  primarily  theoretical  develop¬ 
ment  of  economists.  The  results  of  these  studies  suggest,  moreover, 
that  there  arc  a  variety  of  determinants  of  inter-temporal  choice  be¬ 
havior  little  considered  K v  economists.  I  will  further  discuss  one  or 
two  of  these  problems  for  economic  theory  while  summarizing  the  psy¬ 
chological  results  in  Part  Two/ One. 
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Part  Two/One 

PSYCHOLOGICAL  STUDIES  OF  CHOICE  INVOLVING  TIME 

Over  the  last  10  years  or  so  a  number  of  psychologists  have  been 
studying  how  people  make  choices  involving  time.  The  central  theme  of 
research  in  this  particular  area  has  concerned  the  determinants  of  an 
individual's  willingness  to  choose  a  smaller  immediate  reward  over  a 
larger  later  reward.  In  this  part  of  my  dissertation  I  will  review 
some  of  the  findings  of  this  school  of  research,  then,  in  the  second 
section  of  this  part,  look  at  some  of  the  determinants  of  willingness 
to  delay  gratification.  Finally  I  sketch  very  briefly  an  experiment 
that  I  hope  to  perform  at  some  later  time  to  look  into  more  decail 
at  methods  of  obtaining  a  quantitative  measure  of  time  preference. 

I.  WILLINGNESS  TO  DELAY  GRATIFICATION  AND  PUNISHMENT 

Professor  Walter  Mischel  of  the  Stanford  Psychology  Department  has 
been  the  researcher  most  interested  in  examining  people's  willingness 
to  delay  gratification  and  reward.  He  has  been  publishing  papers  In 
this  general  area  since  the  late  1950s;  however,  I  will  in  this  part 
review  only  some  of  his  most  recent  work  which,  by  and  large,  super¬ 
sedes  that  done  previously.  After  reviewing  three  papers  of  his  I 
will  diacusa  briefly  some  of  the  implications  of  those  findings  for 
the  type  of  economic  theory  of  utility  and  time  preference  discussed 
in  Part  Two/TVo . 

Mischel  |20}  provides  a  fairly  extensive  survey  of  the  work  done 
In  this  area  prior  to  1966.  One  rather  systematic  tarly  finding  is 
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that  the  likelihood  that  the  subject  choose  an  early  smaller  reward 
over  a  delayed  but  larger  reward  decreases  as  the  time  interval  in¬ 
creases  before  receiving  the  larger  delayed  reward.  They  further  found 
that  willingness  to  delay  gratification  for  a  later  reward  depends  on 
the  relative  magnitudes  of  the  two  rewards  involved,  very  much  as  one 
would  intuitively  expect.  The  bulk  of  tuis  paper  by  Mischel  is  dedi¬ 
cated  to  reporting  results  of  five  experiments  that  he  and  his  co-workers 
^ad  performed  over  the  preceding  several  years. 

In  their  first  study  they  examined  the  effects  of  making  attain¬ 
ment  of  the  larger,  later  reward  f"--1ngent  on  successful  performance 
of  an  intermediate  task.  They  found,  not  surprisingly,  that  the  more 
successful  people  had  been  in  previously  given  similar  tasks  the  more 
likely  it  was  that  they  be  willing  to  delav  for  larger  reward.  Also, 
subjects  with  a  fairly  low  level  of  self-confidence  were  rather  more 
apt  to  take  immediate  but  lower  rewards.  Unfortunately,  however,  for 
the  purpose  of  studying  the  effects  of  pure  time  preference  the  extra¬ 
neous  variables  in  this  experiment — uncertainty  about  successful  com¬ 
pletion  of  a  task  and  the  potential  disutility  of  actually  performing 
it — considerably  confused  the  picture.  Nevertheless  the  direction  of 
the  effects  is  very  much  as  one  would  intuitively  predict. 

A  second  class  of  experiments  looked  at  how  uncertainty  concern¬ 
ing  whether  or  not  the  later  reward  would  actually  be  attained  affecteu 
willingness  to  delay  gratification.  This  same  sort  of  effect  is  ex¬ 
amined  in  more  detail  in  later  experiments  reported  in  Mischel  and 
Crusec  (21). 
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Once  again  findings  were  very  much  as  one  would  intuitively  hope. 
Increasing  the  probability  that  a  subject  would  in  fact  obtain  a  later 
but  larger  reward  resulted  in  an  increased  likelihood  the  subject  would 
choose  that  option.  The  theoretical  formulations  concerning  the  rea¬ 
sons  for  the  existence  of  impatience  employed  by  Mischel  and  his  co¬ 
workers  at  this  time  was  primarily  centered  around  this  uncertainty 
aspect;  the  lesser  probability  of  in  fact  attaining  more  distant  re¬ 
wards  was  construed  as  the  primary  reason  for  choosing  smaller  imme¬ 
diate  gratification.  This  study  reports,  however,  no  attempt  to 
quantify  attitudes  towards  time  preference  or  uncertainty  nor  does  it 
attempt  to  look  at  trade-offs  between  time  preference  and  uncertainty. 

A  third  class  of  experiments  reported  in  this  major  article  by 
Mischel  looked  at  attempts  to  modify  subjects'  willingness  to  choose 
delayed  gratifications.  They  were  abl :  to  obtain  rather  large  modifi¬ 
cations  in  willingness  to  delay  rewards  with  both  live  and  symbolic 
models  of  rather  different  behavior.  (In  the  symbolic  models  the  sub¬ 
jects  were  simply  told  about  the  behavior  of  others  who  had  to  make 
choices  involving  time.)  The  fourth  and  fifth  experiments  reported 
in  this  survey  by  Mischel  concerned  how  various  forms  of  behavior  of 
models  and  characteristics  of  models  influenced  other  aspects  of  a 
subject's  behavior  than  that  of  choice  involving  time. 

As  previously  mentioned  the  primary  reason  ascribed  by  Mischel 
and  his  co-workers  for  the  existence  of  time  preference  was  uncer¬ 
tainty.  They  held  this  view  through  probably  1967  and  many  of  the 
experiments  performed  up  to  that  time  had  uncertain  later  rewards  as 
well  as  other  intervening  variables  mixed  into  the  experiments  in  a 
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way  that  confused  the  interpretation  and  the  results  somewhat.  In  a 
very  recently  reported  study  by  Mischel,  Grusec,  and  Masters  [22]  the 
existence  of  pure  time  preference  is  given  a  more  central  role  and 
they  designed  a  set  of  experiments  to  look  at  just  that  effect.  Again 
their  qualitative  results  are  that  the  more  a  reward  is  delayed  the 
less  likely  it  is  to  be  chosen  over  a  smaller  immediate  reward.  How¬ 
ever,  there  is  one  other  aspect  of  their  work  that  extends  some  of  the 
results  reported  in  Mischel  and  Grusec  and  that  is  of  considerable 
importance  here.  That  is  that  they  also  looked  at  individual's  willing¬ 
ness  to  delay  punishments.  The  results  they  find  here  are  rather  in¬ 
consistent  with  a  theory  of  inter-temporal  choice  based  on  discounting 
future  utilities  or  disutilities.  First,  among  adult  subjects,  they 
find  that  the  length  of  delay  time  does  not  affect  willingness  to  put 
off  punishment;  adults  in  general  preferred  immediate  punishment-  to 
more  delayed  ones  no  matter  what  the  length  of  the  time  interval.  For 
children,  on  the  other  hand,  there  seems  to  be  no  systematic  relation¬ 
ship  between  temporal  considerations  and  punishment.  Sometimes  they 
will  choose  the  delayed  punishment,  sometimes  not.  Apparently  these 
studies  by  Mischel  and  his  co-workers  are  the  first  that  look  in  any 
detail  at  punishment  and  its  effect  on  temporal  choice  if  the  time 
intervals  are  of  any  length.  They  do  discuss  some  previous  results, 
however,  for  very  short  time  interval  delays  of  punishment.  For  ex¬ 
ample,  they  mention  a  study  of  Cook  and  Barnes  [2]  in  which  adults 
were  allowed  to  choose  how  long  to  delay  an  inevitable  small  shock. 

The  delay  times  available  for  choice  were  only  on  the  order  of  frac¬ 
tions  of  a  minute.  Almost  invariably  in  these  circumstances  adults 
chose  an  immediate  shock  rather  than  delaying  it. 
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There  are  a  number  of  things  about  these  findings  that  are  unset¬ 
tling  for  economic  theory.  First,  and  in  a  sense  more  minor,  there 
are  a  number  of  exogenous  seeming  factors  that  do  influence  choice  be¬ 
havior  under  these  circumstances.  For  example,  a  subject  who  has  just 
received  a  reward  is  more  willing  to  undergo  immediate  punishment  than 
he  would  otherwise  be.  Also,  subjects  behavior  is  somewhat  easily 
modified  by  observation  of  alternative  behaviors.  In  addition  there 
was  some  evidence  that  the  order  in  which  subjects  made  a  number  of 
choices  involving  time  would  affect  the  outcome  of  his  choices. 

However,  what  I  think  is  the  most  fundamental  difficulty  posed  by 
these  results,  is  that  subjects  do  seem  to  behave  very  differently  with 
respect  to  di  Laying  rewards  than  they  do  with  respect  to  delaying  pun¬ 
ishments.  This  seems  to  me  to  pose  a  very  fundamental  difficulty  for 
the  theory  of  utility  and  time  preference  that  is  formally  sketched  in 
Part  Two/Two  of  this  dissertation.  According  to  the  theory  presented 
there  subjects  with  a  positive  rate  of  time  preference  should  prefer 
to  delay  punishment  as  much  as  possible.  This  follows  from  the  implicit 
assumption  that  the  point  events  that  are  studied  in  these  experiments 
represent  simply  reversals  of  two  events  within  a  time  stream.  That 
is,  there  is  the  event  of  doing  nothing  and  there  is  also  the  event  of, 
say,  receiving  a  small  shock  and  these  two  events  are  reversed  in  the 
time  stream.  Since  the  utility  of  doing  nothing  is  higher  than  that 
of  receiving  a  small  shock,  according  to  the  standard  utility  analysis, 
anyone  with  a  positive  rate  of  time  preference  would  wish  to  delay  the 
shock  as  much  as  possible.  Yet  this  is  not  observed.  What  this  sug¬ 
gests  is  that  there  is  some  sort  of  natural  zero  to  the  utility  level, 


a  result  inconsistent  with  the  general  economists  result  of  utility 
being  unique  only  up  to  a  positive  linear  transformation.  For  with 
the  positive  linear  transformation  there  is,  of  course,  no  natural 
zero  level.  The  critical  result  is  that  the  behavior  of  the  subject 
concerning  events  that  have  a  utility  below  the  zero  level  is  quali¬ 
tatively  differs  from  his  behavior  concerning  events  having  a  utility 
above  that  zero  level. 

One  intuitive  way  to  look  at  this  sort  of  thing  is  to  assume  that 
any  particular  event  does  not  have  utility  simply  at  the  time  that  it 
occurs  which  is  then  discounted  back  to  a  present  time  in  order  for  a 
person  to  make  a  decision.  Rather,  any  event  generates  a  time  stream 
of  utility  and  each  portion  of  that  time  str  am  is  discounted  to  the 
present.  The  cause  of  this  time  stream  of  utility  is  a  memory  of  past 
events  and  anticipation  of  future  ones.  (This  way  of  looking  at  past 
events  having  an  influence  on  present  utility  is  rather  different  than 
that  advanced  by  Charles  Wolf  in  a  recent  paper.  Wolf  [37]  is  primar¬ 
ily  concerned  with  looking  at  how  our  past  commitments  and  actionr  can 
influence  the  utility  of  what  we  do  now.  What  I  am  suggesting  here, 
on  the  other  hand.  Is  simply  that  we  continue  to  enjoy  now  the  memories 
of  pleasant  past  events  and  occasionally  to  blush  over  past  mistakes.) 

If  we  do  assume  that  events  cause  these  utility  streams  in  time. 


then,  given  that  there  is  some  sort. 


.ural  zero  to  our  utility 


function,  we  can  postulate  rather  different  time  streams  for  those 
events  with  positive  from  those  with  negative  utility.  Intuitively  I 
expect  two  sorts  of  things.  First,  people  will  tend  to  more  readily 


forget  unpleasant  events  than  pleasant  ones.  Tims  the  disutility  of 
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a  stream  resulting  from  an  unpleasant  event  we  would  expect  to  fall 
off  more  rapidly  than  the  utility  stream  generated  by  a  pleasant  event 
of  the  same  absolute  magnitude  in  some  sense.  Second,  future  unpleas¬ 
ant  events  tend  to  cause,  I  intuitively  feel,  more  present  fear  and 
anxiety  than  do  future  pleasant  events  cause  present  pleasure  of  an¬ 
ticipation.  Thus,  the  disutility  stream  of  a  future  unpleasant  event 
should  rise  more  rapidly  than  does  the  utility  stream  of  a  future 
pleasant  event. 

What  would  be  desirable  would  be  to  represent  these  utility  dis¬ 
tributions  by  functions  having  their  mode  at  the  time  of  occurrence 
of  the.  event  in  question  and  that  distribute  the  utility  from  the  event 
over  an  interval  of  time.  Further,  that  distribution  should  be  skewed 
toward  the  present  for  undesirable  events  and  more  toward  the  past  for 
desirable  events.  Clearly,  however,  a  good  deal  more  of  both  theo¬ 
retical  and  empirical  work  needs  to  be  done  in  order  to  make  much  prog¬ 
ress  with  these  notions. 

II.  FACTORS  INFLUENCING  AN  INDIVIDUAL'S  CAPACITY  TO  DEFER  GRATIFICATION 


Let  me  begin  by  quoting  the  introspective  and  somewhat  value  laden 
but  interestinp  comments  of  Irving  Fisher  concerning  the  determinants 
of  impatience  among  individuals.  On  Page  89  of  The  Theory  of  Interest , 
Fisher  [6]  asserts: 

Impatience  for  income,  therefore,  depends  for  each  individual  on 
his  income,  on  its  size,  time  shape,  and  probability;  but  the 
particular  form  of  this  dependence  differs  according  to  the  var¬ 
ious  characteristics  of  the  individual.  The  characteristics  which 
will  tend  to  make  his  impatience  great  are:  (1]  short-sightedness, 
(2)  a  weak  will,  (3)  the  habit  of  spending  freely,  (A)  emphasis 
upon  the  shortness  and  uncertainty  of  his  life,  (5)  selfishness, 
or  the  absence  of  any  desire  to  provide  for  his  survivors,  (6) 
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slavish  following  of  the  whims  of  fashion.  The  reverse  conditions 
will  tend  to  lessen  his  impatience;  namely,  (1)  a  high  degree  of 
foresight,  which  enables  him  to  give  to  the  future  such  attention 
as  it  deserves;  (2)  a  high  degree  of  self  control,  which  enables 
him  to  abstain  from  present  real  income  in  order  to  increase  fu¬ 
ture  real  income;  (3)  the  habit  of  thrift;  (4)  emphasis  upon  the 
expectation  of  a  long  life;  (5)  the  possession  of  a  family  and  a 
high  regard  for  their  welfare  after  his  death;  (6)  the  indepen¬ 
dence  to  maintain  a  proper  balance  between  outgo  and  income  re¬ 
gardless  of  Mrs.  Grundy  and  the  high-powered  salesmen  of  devices 
that  are  useless  or  harmful,  or  which  commit  the  purchaser  beyond 
his  income  prospects. 

There  appears  to  be  little  evidence  available  at  the  j.  -esent  time 
in  the  psychological  literature  to  either  substantiate  or  refute  most 
of  the  suggestions  that  Fisher  makes,  though  there  does  exist  specula¬ 
tion  even  in  early  psychoanalytic  literature — see  Brenner  [1,  50-52]. 
However,  concerning  two  potential  determinants  of  willingness  to  save 
there  is  some  evidence,  although  not  always  clear-cut  in  its  results. 

The  two  areas  for  which  there  does  exist  evidence  concern  the  rela¬ 
tionship  of  "achievement  motivation"  to  willingness  to  postpone  grati¬ 
fication  and  the  relationship  of  socio-economic  class  to  this. 

I  have  been  able  to  find  two  studies  that  relate  socio-economic 
class  to  willingness  to  postpone  gratification.  The  first,  reported 
by  Cameron  and  c'torm  [1A],  looked  a.  achievement  motivation  and  income 
in  middle  and  working  class  Canadian  Indian  children.  They  found  that 
a  middle  class  child  was  more  likely  than  Indian  or  working  class 
children  of  the  same  age  to  prefer  large  delayed  rewards  to  smaller 
immediate  ones.  In  a  rather  more  substantial  study,  however,  Straus 
[29]  obtained  different  results.  He  tested  willingness  to  defer  grati¬ 
fication  in  a  population  of  over  three  hundred  male  high  school  students. 
One  of  the  three  hypotheses  that  he  was  testing  was:  "the  higher  the 
socio-economic  level,  the  greater  the  tendency  to  defer  gratification." 


Straus  was  unable  to  find  any  evidence  to  support  the  hypothesis  that 
there  Is  a  positive  correlation  between  socio-economic  status  and  will¬ 
ingness  to  defer  gratification. 

A  good  fraction  of  the  psychologists  involved  in  study  of  willing¬ 
ness  to  defer  gratification  have  worked  under  the  influence  of  the  group 
of  psychologists  currently  studying  "achievement  motivation".  In  a  re¬ 
cent  brief  survey  textbook  entitled  Motivation  and  Emotion,  Murray  [23] 
lists  five  broad  classes  of  human  motivations,  such  as  sex,  hunger,  and 
thirst,  etc.  One  of  these  classes  was  social  motivations;  under  that 
class  he  lists  twenty  different  types  of  social  motivations.  One  of 
these  twenty  is  achievement  motivation,  or  need  for  achievement ;  this 
particular  type  of  motivation  has  been  much  popularized  by  the  wide 
success  of  the  book  entitled  The  Achieving  Society,  by  David  McClelland 
[19].  McClelland's  thesis  is  that  when  a  reasonably  large  number  of 
people  in  a  society  for  some  reason  or  another  acquire  a  large  need  for 
achievement,  then  things  begin  to  happen  in  that  society — particularly 
entrepreneurial  activity  leading  to  economic  growth.  McClelland'9  ar¬ 
guments  have  been  rather  vigorously  challenged  in  some  of  the  economic 
journals,  although,  I  think,  there  is  general  agreement  that  his  focus¬ 
ing  on  the  motivations  of  individuals  within  the  societv  leads  to  an 
important  way  of  looking  at  the  determinants  of  economic  growth.  On 
the  other  hand,  a  number  of  the  psychological  premises  behind  his  work 
have  remained  relatively  unchallenged;  in  particular,  his  focusing  al¬ 
most  exclusively  on  achievement  motivation  to  the  exclusion  of  a  tre¬ 
mendous  variety  of  other  possible  motivations  and  his  failure  to  look 
at  the  correlations  among  motivations  must  be  counted  as  a  serious 
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shortcoming  in  his  work.  It  is  sufficient  to  note  here,  however,  that 
one  of  the  results  of  his  book  has  been  to  stimulate  a  good  deal  of 
research  concerning  the  attitudes  of  people  with  high  levels  of  achieve¬ 
ment  motivation  toward  delay  of  gratification.  On  Pages  324  through 
329  McClelland  summarizes  some  of  his  results  concerning  attitudes  to¬ 
ward  time  of  people  with  high  achievement  motivation  and  a  more  up-to- 
date  summary  of  some  of  these  results  may  be  found  on  Pages  41  through 
45  of  Heckhausen  [7].  Three  separate  studies  cited  by  Heckhausen  sup¬ 
port  the  notion  that  measures  of  achievement  motivation  are  positively 
correlated  with  willingness  to  defer  gratification.  This  result  is 
also  borne  out  by  the  previously  cited  paper  of  Straus  [29],  The  third 
of  the  hypotheses  that  he  was  testing  was  "the  greater  the  tendency  to 
defer  gratification,  the  higher  the  performance  on  two  measures  of  the 
'achievement  syndrone'."  He  found  some  evidence  to  support  this  hy¬ 
pothesis  and  concludes  hi  paper  with  the  following  comment:  "Learning 
to  defer  need  gratification  seems  to  be  associated  with  achievement  at 
all  levels  of  the  status  hierarchy  represented  in  this  sample,  and 
hence  can  probably  best  be  interpreted  as  one  of  the  personality  pre¬ 
requisites  for  achievement  roles  in  contemporary  American  society." 

I  think  that  these  results  must  be  considered  primarily  as  qualitative 
tendencies  of  association  rather  than  any  explicit  precise  correlational 
findings.  One  reason  for  this  is  the  essentially  ordinal  nature  of 
measures  of  achievement  motivation. 

This  concludes  my  comments  on  work  that  has  beer,  previously  done 
by  psychologists  measuring  tine  preference  and  relating  it  to  various 
characteristics  of  individuals.  In  the  work  th jt  I  have  read  so  far 
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by  these  psychologists  I  have  seen  no  reference  at  all  to  the  rather 
extensive  economic  literature  concerning  time  preference  nor  any  ser¬ 
ious  attempt  to  formulate  explicit  quantitative  models  of  the  phenomena 
being  investigated.  It  does  seem  to  me  that  some  interesting  experi¬ 
mental  results  could  be  obtained  by  designing  experiments  in  terms  of 
the  theoretical  structure  developed  in  the  next  section  of  this  paper 
and  the  experimental  techniques  utilized  by  Tversky  [32,  33]  in  the 
formally  very  similar  problem  of  measuring  subjective  probabilities. 
What  I  would  hope  to  do  in  these  experiments  is,  first,  demonstrate  a 
capability  to  provide  a  relatively  clear  quantitative  measure  of  time 
preference,  and,  second,  to  attempt  to  elate  this  measure  in  some 
systematic  way  to  various  personality  and  socio-economic  variables 
associated  with  the  individual.  ^>ne  question  that  will  have  to  be 
investigated  is  whether  or  not  an  individual's  time  preference  can  be 
represented  by  a  single  rate — necessarily  assumed  to  be  constant — or 
whether  some  vector  of  numbers  will  be  needed  to  describe  his  dis¬ 
counting  pattern  for  different  time  intervals.  To  measure  personality 
characteristics  I  would  plan  to  work  in  collaboration  with  Professor 
Andrew  Comrey  of  UCLA  who  has  developed  over  the  last  ten  years  a 
rtther  comprehensive  personality  inventory.  Questionnaires  would  be 
used  and  selective  sampling  techniques  to  gain  the  socio-economic  back¬ 
ground  information  and  to  select  the  appropriate  populations  to  obtain 


that  information  from. 
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Part  Two/Two 

FORMAL  THEORY  OF  DECISIONS  UNDER  CERTAINTY  INVOLVING  TIME 

1  *  IHE  AXIOMS  OF  THE  THEORY  j 

i 

■f  ‘ 

Because  of  the  similarity  between  the  problem  considered  here  and  i 

! 

'■* 

that  of  decisions  under  uncertainty.  Savage's  axioms  [23]  are  reinter-  j. 

preted  in  this  context  below. 

The  basic  subject  matter  of  the  theory  is  the  following: 

1.  The  set  F  of  all  points  in  time  fro1”  some  initial  time  into 
the  future, 

2.  A  set  T  of  tine  periods  which  are  subsets  of  F  such  that  F  r 

T;  0  f  T:  If  t,  t.  T,  then  F  -  t.  *  T;  and  if  t  ,  t.  r  T,  then 

*  1  1  J 

t  t.  e  T  and  t.  V  t.  e  T, 
i  J  i  3 

3.  A  set  X  of  consequents  whose  elements  are  commodity  vectors, 

4.  A  set  D  of  de  'i si oks ,  each  of  which  is  a  function  from  F  into 

X  (D  is  assumed  to  include  >11  ooKSt J?:i  decisions,  i.e.,  de¬ 
cisions  such  that  for  some  x^  and  for  all  t  F,  d(t>  *  x.)»  and 

5.  A  relation  on  Che  set  D. 

The  notation  d  <  c  is  interpreted  as  "d  is  not  preferred  to  o." 

If  d  <  e  and  e  <  d,  then  the  two  decisions  will  be  said  to  be  indif¬ 
ferent,  denoted  d-  e.  If  d  -  e,  and  not  d~  e,  then  e  will  be  said 
to  be  strictly  preferred  to  d,  b :  .need  d  <  c.  The  symbols  -  and  *  ire 
defined  in  the  obvious  way. 

The  axioms,  listed  below,  are  described  on  pp.  21  and  2S. 

Axiom  1.  *'  r  •  J I  d ,  e ,  f  <  D,  d  ^  e  :r  :  e  *  f  d  *"  f . 

Fcr  -  e,  i  f  D,  e'  f  or  f  •  e . 
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Consider  a  time  period  B  in  T.  A  decision  d  is  said  to  "agree" 
with  e  during  B  if  d(t)  =  e(t)  for  all  t  c  B. 

Axiom  2.  If  B  e  T  and  if  for  d,  e,  d',  e'  e  D,  the  following 

hold: 

1.  Ir.  F  -  B,  d  agrees  with  e  and  d'  agrees  with  e' 

2.  In  B,  d  agrees  with  d'  and  e  agrees  with  e' 

3.  d  <  e 
then  d'  <  e' . 

Several  new  notions  must  now  be  introduced.  If  decisions  d  and 
e  are  modified  so  as  to  agree  in  F  -  B  (i.e.,  except  during  B)  and  if, 
after  modification,  d  <  e,  then  d  <  e  during  B.  (This  definition  is 
legitimate  by  Axiom  2;  that  is,  it  does  not  matter  what  d  and  e  are 
modified  to  during  F  -  B.)  A  time  period  B  will  be  said  to  be  irrel¬ 
evant  if  for  all  d,  e,  t  D,  d~  e  during  B.  A  preference  relation  < 

--  c 

on  tiie  set  of  consequences  X  can  be  defined  in  terms  of  <  in  the  fol¬ 
lowing  wav:  If  x.,  x,  •  X,  then  x,  <  x  if  and  onlv  if  for  constant 
1  j  i  ) 

decisions  d.  and  d.  such  that,  for  all  t,  d^(t)  *  x,  and  dj(t}  55  Xj> 

then  d.  d.. 
i  ~  J 

Axiom  3.  If  fv  f  <  B,  d < t )  -  x  m  i  d'it)  ■  x' ,  ..zn-.i  if  S  is 


r/v  d  -  d'  luring  B  if  an  i  . 

Axu'm  i.  f ,  f  ’  ,  g,  g’  .  X;  A,  B  -  T;  f 

i  -  !  -  .  t  ,  g  *■  g 


x  -  x 


B’  ’ 


D; 


f ,  ( t .)  *  : ,  r  ( t )  *  g  for  t  •  A 

A  A 


fA(0  -  f’ ,  gA(t) 


'Vr  t  >  F  -  A 


'•  fR(t)  *  f,  )  *  g  •*  i  ■■  B 


f0<t >  *  f ' .  *K(t) 


r  t  i  F  -  B 
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then  gA<  8,. 

i 

Axiom  5.  For  some  x,  x'  c  X,  x  <  x'. 

-  c 

A  temporal  partition  is  a  subset  T*  of  T  such  that  for  every  t  e 
F  there  is  exactly  one  t^  t  T*  such  that  t  c  T^.  A  regular  temporal 
partition  is  a  temporal  partition  T**  such  that  the  time  periods  in  T** 
are  intervals  and  of  equal  length. 

Axiom  6.  Suppose  x  e  X  and  d,  e  e  D  with  d  <  e.  There  exists  a 
temporal  partition  ■•ch  that  if  d  or  e  is  modified  on  any  time  period 
of  the  partition  co  take  the  value  x  for  all  t  in  that  time  veriod, 
other  time  per: ods  being  undisturbed ,  then  the  modified  d  remains  infe¬ 
rior  to  e,  or  d  remains  inferior  to  the  modified  e,  as  the  ease  may  be. 

These,  then  are  the  axioms  of  the  theory.  Axiom  1  is  the  obviously 
necessary  requirement  that  <  be  a  weak  order.  Axiom  2  is  the  "sure- 
thing  principle"  in  the  context  of  decision  up  :er  -ncertainty;  here  it 
acts  as  a  rather  strong  independence  assumption.  (Axiom  2  is  discussed 
in  more  detail  in  Section  VI.)  Axiom  3  simply  states  that  if  one  con¬ 
sequence  is  inferior  to  another  and  two  decisions  are  everywhere  iden¬ 
tical  except  during  one  relevant  time  period  such  that  during  that  time 
period,  the  first  decision  has  the  inferior  consequence  and  the  second 
the  superior  one,  then  the  first  decision  is  inferior  to  tae  second 
one  . 

Axiom  4  cakes  possible  an  ordering  among  t  *  rae  periods;  "A  -  B" 

"O  '  -  o 

can  he  read  "A  is  more  discounted  than  B."  Considet  eve  consequences 

x  and  y  such  that  x  is  definitely  preferred  to  v.  l.r  t  d  he  a  decision 

A 

such,  that  x  is  the  result  dut  ing  A  and  is  the  result  during  F  -  A; 


-28- 


d  is  similarly  defined.  If  A  and  B  are  time  periods  of  equal  length 

D 

with  A  being  in  the  near  future  and  B  in  the  far  future,  and  the  in¬ 
dividual  has  a  positive  rate  of  time  preference,  then  we  would  expect 

d  -<  d  .  Ur  if  A  and  B  were  at  about  the  same  time  but  B  was  consid- 
D  A 

erably  shorter  than  A,  we  would  expect  d  <  d  .  Assume  that  if  for 

i3  n 

one  x  and  y  pair  y  <  x  implies  d  <  d  ;  then  for  all  x  and  y  such 

C  D  A 

that  y  <  x,  d_  <  d..  We  would  then  be  justified  in  defining  <  in 

the  following  way:  B  <  A  if  and  only  if  d  <  dA .  Axiom  4  asserts 

this  invariance  of  the  ordering  <  with  respect  to  the  x  and  y  chosen 

~o 

Axiom  5  is  simply  an  assumption  of  nontriviality;  only  Buridan's 
ass  would  have  difficulty  were  Axiom  5  to  fail. 

Axiom  6  is  an  assumption  that  temporal  partitions  can  be  made 
exceedingly  fine. 
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II.  TOE  PRINCIPAL  THEOREMS 

The  principal  theorems  of  Part  Two/Two  follov  directly  from 
reinterpretation  of  theorems  in  Ref.  28.  Hence,  proofs  will  be  out¬ 
lined  only  very  briefly  here.  All  of  these  theorems  assume  Axioms  1 
through  6. 

Theorem  i_.  There  exists  a  unique  real-valued,  function  6  defined 
on  T  such  that  if  A,  3  e  T: 

1.  <5 (A)  s  6(3)  if  and  only  if  k  <  B  , 

~o 

2.  If  A  is  irrelevant ,  6(A)  =  0  , 

3.  6(F)  =  1,  and 

4.  If  A  A  B  =  0,  6(A  v  B)  =  6(A)  +  6(B) 

The  proof  of  this  theorem  rests  on  noting  that  <  acts  like  a  qualita- 

~o 

tive  probability  defined  on  T.  Axiom  6  insures  that  this  qualitative 
probability  is  fine  and  tight;  that  in  turn  implies  the  existence  of  a 
probability  measure  that  strictly  agrees  with  the  qualitative  probabil¬ 
ity.  This  probability  measure  is  interpreted  here  as  the  function  6. 

The  following  corollary  to  Theorem  1  is  perhaps  more  useful  where 
time  preference  is  concerned. 


Corollary  l_.  If  T**  is  a  regular  temporal  partition  with  elements 


tl’  C2  * 

....  arranged  in  order. 

then  there  exists  a  unique  function  A 

defined 

on  T**  such  that: 

1. 

A(tj)  «  1  , 

2. 

A(t^)  <;  A(tj)  if  and  on 

ly  if  t  <  t  ,  and 

1  ~o  j 

3. 

S',  A(t1)  <  °°  • 

i=l 
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A  function  A  satisfying  conditions  1  through  3  wil]  be  called  a 
discount  function.  The  proof  of  existence  in  Corollary  1  follows  from 
Theorem  1  and  Axiom  6,  which  will  give  the  countable  additivity  required 
for  part  3.  The  uniqueness  follows  from  Theorem  1,  establishing  unique¬ 
ness  up  to  multiplication  by  a  positive  constant,  and  the  normalization 
of  part  1. 

There  are  a  number  of  alternative  axiomatizations  for  insuring  that 
a  probability  measure  exists  that  strictly  agrees  with  a  qualitative 
probability  (see  Fishburn  [5])*  However,  it  appears  likely  that  apply¬ 
ing  those  approaches  to  the  time-preference  problem  would  yield  only 
slightly  different  assumptions,  under  which  essentially  the  same  con¬ 
clusions  would  follow. 

Let  us  now  examine  the  existence  of  a  utility  function.  A  deci¬ 
sion  d  will  be  defined  as  constant  on  a  time  'period.  A,  if  there  exists 
a  consequence  x  e  X  such  that  d(t)  =  x  for  all  t  t  A.  From  now  on,  we 
shall  consider  only  regular  temporal  partitions,  T**,  where  the  avail¬ 
able  decisions  are  constant  on  elements  of  the  partition.  It  is  clear 
that  if  this  is  so,  there  is  no  ambiguity  in  writing  d^(t^)  if  t^  t  T**. 

A  utility  against  A  is  a  real-valued  function  U  on  X  with  the  prop¬ 
erty  that  if  all  e  D  are  constant  on  the  elements  t^,  t^,  ...  of  T**, 
and  A  is  a  discount  function  on  T**,  then  for  all  d^,  e  D  the  fol¬ 
lowing  is  true : 

00  °° 

d1  <  d^  if  and  only  if  ^  A(tk)  U^di^tk^  S  S  A^tk') 

k—  1  k— 1 

Theorem  2.  If  T**  is  a  regular  temporal  partition,  A  is  a  dis¬ 
count  function  on  T**,  and  all  decisions  are  constant  on  etements  of 


T**,  then  Axioms  l  through  6  imply  that  there  exists  a  utility  against 
+ 

A.  ' 

Theorem  _3.  If  U  is  a  utility  against  A,  then  U*  is  a  utility 

against  A  if  and  only  if  U*  =  aU  +  b,  where  b  is  any  number  and  a  is 

•  *  •  "t" 

any  strictly  positive  number. 

The  present  utility  of  a  decision  d  that  is  constant  on  the  ele¬ 
ments  t^»  t^,  . ..,  of  a  regular  temporal  partition  is  thus  defined  in 
the  following  way: 

PU(d)  =  J  A(t  )  U[d(t  )], 
j  =  l  3  3 

given  a  discount  function  A  and  a  utility  U. 

In  summary,  then,  Axioms  1  to  6  suffice  to  prove  the  existence  of 
measures  of  time  preference,  A,  and  utility,  U,  such  that  one  decision 
is  preferred  to  another  if  and  only  it  its  present  utility  is  greater. 

t 

Theorems  2  and  3  are  proven  in  Ref.  28  and  little  altered  there 
from  the  original  proof  of  von  Neumann  and  Morgenstem  [35], 


III.  ADDITIONAL  RESULTS 


This  analysis  has  produced  several  additional  results.  Let  us 
first  consider  conditions  that  will  insure  a  constant  rate  of  time 
preference.  Here,  a  constant  rate  of  discount  defined  on  a  regular 
temporal  partition  T**  means  simply  that  if  the  elements  of  T**  are, 
in  order,  t^,  ....  then  A  has  the  property  that  A(t^+^)  =  a  A(t^) 

for  some  constant  a  (necessarily  <  1)  and  for  i  =  1,  2,  ....  If  D  is 
a  set  of  decisions  constant  on  elements  of  a  regular  temporal  parti¬ 
tion  T**,  then  the  relation  <  on  D  is  said  to  be  stationary  if  when¬ 
ever  the  elements  d,  e  e  D  are  such  that  d(t^)  =  e(t^)  and  d  <  e,  then 
the  decisions  d'  and  e'  formed  by  deleting  the  first-period  consequences 
in  d  and  e  and  advancing  the  other  consequences  by  one  time  unit  (e.g. , 
d'(ti>  =  d(tj_^))  are  such  tht  d'  <  e' . 

Theorem  4_.  If  T**  is  a  regular  temporal  partition 3  if  the  mem¬ 
bers  of  D  are  constant  on  elements  of  T**,  and  if  <  is  stationary }  then 
there  is  a  constant  rate  of  time  preference. 

The  proof  of  Theorem  4  is  analogous  to  a  similar  proof  in  Koop- 
mans  [10]. 

Another  result  from  the  theory  of  choice  under  uncertainty  that 
can  be  applied  to  the  intertemporal  context  is  one  due  to  Pfanzagl  [24], 
Let  the  elements  of  X  be  represented  on  a  real  continuum,  e.g.,  the 
values  of  x  could  be  dollar-consumption  income  per  unit  time.  Con¬ 
sider  a  relations  on  D  that  satisfies  Axioms  1  through  6.  For  every 

d  e  D,  define  d'  =  d  +  x  for  some  x  c  X;  that  is,  the  value  of  every 

oo 

alternative  is  being  enhanced  by,  say,  xq  dollars  per  unit  time  in 
every  time  period.  Pfanzagl's  consistency  principle  asserts  that  the 
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preference  relation  on  d'  is  the  same  as  that  on  d:  Adding  a  constant 
to  every  time  period  of  every  decision  in  no  way  alters  the  preference 
ordering  among  the  decisions.  In  some  ways  a  plausible  assumption, 
the  consistency  principle  yields  the  following  very  restrictive  re¬ 
sult  : 

Theorem  5.  If  a  choice  structure  satisfies  Axioms  l  through  6 
and  Pfanzagl's  consistency  principle 3  and  if  X  is  an  interval  of  a 
real  continuum ,  then  U  has  one  of  the  following  two  forms: 

U(x)  =  ax  +  b 
or 

U(x)  =  aXX  +  b 


where  a,  b,  and  A  are  constants  with  a  t  0  ccnd  A  >  0. 

The  import  of  Pfanzagl's  result  is  illuminated  by  Krantz  and 
Tversky's  [12]  proof  that  the  consistency  principle  is  a  consequence 
of  axioms  concerning  how  adding  to  or  subtracting  from  the  outcomes 
of  decisions  would  affect  the  relative  desirability  of  those  deci¬ 
sions 

LaValle  [14]  has  generalized  Pfanzagl's  results  to  a  situation 
he  calls  multivariate  constant  risk  aversion.  If  the  elements  of  X 
are  indexed  on  a  real  continuum,  and  there  are  a  finite  (this  could 
be  extended  to  denumerable)  number  of  time  periods,  then  LaValle 's 
results  can  be  used  to  obtain  (fairly  restrictive)  sufficient  con¬ 
ditions  for  <  to  be  represented  by  a  utility  function  of  the  form: 


PU(d)  = 


cd 


e  ,  or 
cd,  or 


-cd 


e 
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where  d  is  a  vector  whose  components  specify  the  amount  received  in 
each  time  period,  and  c  is  a  column  vector  with  nonnegative  components. 
The  present  utility  is  unique  up  to  a  positive  linear  transformation. 
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IV.  THE  ASSUMPTION  OF  INDEPENDENCE 

An  assumption  of  independence  is  implied  in  Axioms  2  and  4,  which 
assert  that  there  is  no  complementation  or  substitution  across  time 
periods  and  that  there  can  be  no  preference  for  variety  for  its  own 
sake.  These  assumptions  are  necessary  both  to  obtain  a  measure  of  time 
preference  in  the  first  place  and  to  calibrate  utilities,  given  the 
discount  function. 

Some  of  the  stronger  disadvantages  of  these  assumptions  can  be 
avoided  in  the  following  ways:  First,  the  elements  of  the  consumption 
set  X  may,  as  previously  noted,  be  regarded  as  access  to  rather  than 
acquisition  of  commodities.  For  example,  buying  a  new  car  and  keeping 
it  for  four  years  would  be  regarded  in  this  scheme  as  access  to  a  new 
car  the  first  year,  a  one-year-old  car  the  second  year,  etc.  This 
approach  avoids  some  aspects  of  material  interdependence;  nevertheless, 
the  possibility  that  consumption  during  one  time  period  can  affect  the 
utility  of  consumption  in  other  time  periods  cannot  be  ruled  out.  The 
problem  of  variety  can  be  partly  mitigated  by  allowing  the  components 
of  members  of  the  set  X  to  be  mixtures  of  the  form  "in  New  York  three- 
fourths  of  the  time,  in  Paris  one-fourth."  Extensive  use  of  this  ap¬ 
proach  would,  however,  make  matters  hopelessly  unwieldy. 

Economists  traditionally  favor  nonrestrictive  (i.e.,  weak)  assump¬ 
tions;  as  a  consequence,  they  generally  achieve  weak  results.  To  obtain 
the  fairly  strong  result  that  the  effects  of  time  preference  and  utilit” 
may  be  separated  and  measured  requires  the  strong  assumption  of  inde¬ 


pendence  . 
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How  can  this  assumption  be  justified?  As  a  descriptive  assumption, 
its  advantage  is  that  it  yields  a  relatively  tractable,  testable  theory. 
However,  both  introspection  and  casual  observation  of  the  phenomena  of 
complementation  and  substitution  suggest  that  in  many  circumstances  the 
theory  presented  here  will  be  at  best  only  approximately  valid.  What¬ 
ever  descriptive  value  this  theory  may  have  can  only  be  assessed  in  the 
presence  of  data  and  alternative  theories  to  account  for  those  data; 
therefore  we  should  not  rule  out  independence  as  an  empirical  assumption 
that  may  be  reasonably  valid  in  some  circumstances,  invalid  in  others. 

Can  independence  be  justified  as  an  assumption  in  creating  a  norm¬ 
ative  theory?  Again,  the  answer  is  probably  "yes"  in  many — but  obvi¬ 
ously  not  all — circumstances.  Applied  decision  theory  provides  a  body 
of  techniques  that  will  assist  decisionmakers  faced  with  complex  alter¬ 
natives.  Analyses  such  as  this  can  then  assist  by  breaking  complicated 
decisions  into  simpler  ones — for  example,  by  ignoring  interdependencies 
among  time  periods  and  discounting.  It  must  be  decided  in  each  case 
whether  the  conceptual  clarification  of  the  problem  resulting  from  the 
abstraction  gains  more  than  the  information  ignored  loses.  The  in¬ 
creased  utilization  (and  advocacy)  of  present-value  decision  criteria 
suggests  that  in  many  decision  situations  the  simplification  is  worth¬ 
while.  However,  assuming  that  independence  will  in  many  cases  be  only 
an  approximation  sets  this  theory  apart  from  that  of  Savage  in  an  im¬ 
portant  way.  In  the  uncertainty  context,  the  independence  assumption 
has  sufficient  intuitive  force  that  the  Savage  system  may  be  considered 
unconditionally  normative;  the  time-preference  interpretation  can  be 
considered  only  approximately  normative. 
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V.  DISCUSSION 

The  theory  developed  herein  is  related  in  various  ways  to  other 
theories  of  inter-temporal  choice.  Perhaps  the  best  known  anong  econo- 

■k 

mists  is  that  of  Fisher  [6].  My  work  here  abstracts  away  from 
discussion  of  market  and  physical  investment  opportunities,  all  of 
which  are  subsumed  in  the  consumption  streams  available  within  the 
set  D.  The  present  study  adds  to  earlier  work  in  its  capability  to 
crisply  separate  pure  time  p  eference  from  the  utilitv  of  money,  as 
these  variables  enter  into  economic  choice  (a  distinction  which  is 
impossible  to  make  precise  within  the  approach  of  Fisher).  This  same 
point  is  also  the  primary  advantage  of  the  present  theory  over  a  re¬ 
cent  axiomatic  theory  of  Lancaster  (13). 

Samuelson  [27]  pointed  out  that  if  we  assume  that  a  decisionmaker 
maximizes  present  value  of  utility  and  that  he  discounts  "...in  son*' 
simple  regular  fashion  that  is  known  to  us...,"  then,  hv  observing 
his  actual  choices,  "...we  shall  be  able  to  deduce  the  actual  shape 
of  the  utility  function,  invariant  except  for  a  linear  transformation 
The  principal  conceptual  advance  of  the  theory  presented  in 
this  dissertation  over  .Samuelson 's  is  that ,  instead  of  assuming  the  dis¬ 
count  function  to  be  known,  it  is  shown  to  be  conjointly  measurable 
with  the  utility  function.  Enzer  [4]  independently,  but  almost  thirty 
years  later,  obtained  results  very  similar  to  those  of  Samuelson;  the 

_ 

See  also  Hlrshle t fer's  [8]  extension  of  Fisher's  theory. 

** 

This  seems  a  remarkable  observation  to  have  been  made  ten  years 
before  The  Theory  of  Tkcws  an:!  Kamo-r! r  rehavior.  2d  cd. 
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relationship  between  the  present  theory  and  those  of  Enzer  and  Samuelson 
is  discussed  further  in  Ref.  18. 

•k 

Williams  and  Nassar  [36]  an.,  Fishbum  discuss  ways  of  obtaining 
discount  factors  without  considering  cardinal  utility.  Koopmans,  Dia¬ 
mond,  and  Williamson  [11]  place  axioms  o.i  inter-temporal  utility  func¬ 
tions  that  guarantee  "impatience"  and  "time  perspective"  as  properties 
of  the  utility  functions.  However,  their  study  does  not  involve  axioms 
concerning  preferences  that  will  insure  the  measurability  of  time  pref¬ 
erence  and  utility.  Koopmans  [10]  has  recently  extended  his  previous 
work  to  consideration  of  axioms  concerning  preferences.  Koopmans  proves 
a  theorem  that,  essentially,  guarantees  the  measurability  of  time  pref¬ 
erence  and  utility.  The  principal  difference  between  Koopmans'  approach 
and  my  approach  is  that  by  way  of  Axiom  6  I  am  able  to  provide  suf¬ 
ficient  fineness  to  Che  set  of  temporal  partitions  to  prove  the  ex¬ 
istence  of  a  discount  function  that  strictly  agrees  with  the  qualita¬ 
tive  relation  "is  more  discounted  than".  Koopmans,  on  the  other  hand, 
proceeds  by  adding  what  l.u  •  and  Suppes  [17]  call  a  $ t vusi Ut’:t  1 

ic j:,.’” .  :  •>: — in  his  case,  the  assumption  of  stationarity — to  guarantee 
the  existence  of  a  strictly  agreeing  discount  function.  The  station- 
arity  assumption  is  analogous  (in  the  probability  context),  t >  an  as¬ 
sumption  of  oquiprobable  atomic  events.  This  dissertation  presents  a  more 
general  approach  than  that  of  Koopmans  in  that  the  rate  of  discount  need 
not  bo  constant  or,  in  the  short  run,  even  positive.  (Corollary  1  as¬ 
sures  that  it  is  positive  in  the  long  run.)  Another  difference  is  that, 

* 

P.  C.  Fishbum, 
manuscript. 
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unlike  the  author,  Koopmans  assumes  and  uses  a  continuous  structure  on 
the  set  X  of  outcomes. 

It  must  be  emphasized  that  the  present  work  represents  but  a  lim¬ 
ited  step  in  the  direction  of  aderstanding  choice  involving  time.  The 
problem  of  uncertainty,  discussed  in  more  detail  in  the  next  part,  has 
yet  to  be  thoroughly  resolved;  and  the  interrelated  problems  of  con¬ 
sistency  of  choice  and  desire  for  flexibility  in  future  choice  also 
remain.  Axiom  z  (independence)  should  be  further  examined:  Can  an 
interesting  representation  be  proved  if  it  is  weakened?  How  can  memory 
and  anticipation  (both  crucial  to  understanding  inter-temporal  choice) 
be  taken  into  account?  To  what  extent  is  the  type  of  theory  presented 
here  intended  to  be  descriptive?  What  are  the  psychological  experiments 
or  economic  observations  that  would  support  or  refute  it?  And  tc  what 
extent  is  this  sort  of  theory  supposed  to  be  normative,  i.e,,  how  can 
it  be  profitably  woven  into  the  fabric  of  applied  decision  analysis? 

These,  then,  are  a  lew  of  the  questions  that  remain  to  be  answered 
through  future  research  in  this  area.  It  is  hoped  that  the  theory  pre¬ 
sented  here  will  provid0  a  useful  step  tow.rd  such  solutions. 
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Part  Two/Three 

FORMAL  THEORY  OF  DECISIONS 
UNDER  UNCERTAINTY  INVOLVING  TIME 

In  this  Part  I  attempt  to  extend  the  analysis  of  Part  Two/Two  to 
situations  where  the  options  available  to  a  decision-maker  involve  un¬ 
certainty  as  well  as  time.  My  analysis  here  has  two  aspects.  First 
I  look  at  a  particularly  simple  class  of  intertemporal  uncertain  options 
and  prove  a  somewhat  weak  result  concerning  them.  Next  I  state  axioms 
that  I  conjecture  will  suffice  for  the  general  case. 

1 •  ™L  DISCOUNTED  EXPECTED  UTILITY  MODEL  FOR  SIMPLE  OPTIONS 

Consic.  a  set  A  of  prizes  (e.g,,  amounts  of  money),  a  set  E  of 
uncertain  events,  and  a  set  T  of  future  points  in  time.  An  "option" 
is  a  set  of  triples  of  the  form  (a,  e,  t)  with  a  ^  A,  e  E,  and  t  ^ 

T.  A  "simple"  option  is  an  option  containing  only  one  triple.  An 
individual  will  be  said  to  choose  among  options  in  accord  with  the 
discounted  expected  utility  (DEU)  model  if  there  exist  real  valued 
functions  u  on  A,  p  on  E,  and  d  on  T  such  that  one  option  is  preferred 
tc  another  if  and  only  if  its  DEU  is  greater.  The  DEU  of  an  option  is 
is  the  sum  over  all  triples  (a,  e,  t)  in  the  option  of  the  product 
u(a)p(e)d(t) . 

My  purpose  in  this  section  is  to  state  a  very  simple  theorem  that 
indicates  when  the  DEU  model  holds  for  simple  options.  This  result  :s 
a  straightforward  extension  of  some  results  of  Tversky  [33]  concerning 
what  I  would  call  simple  options  having  no  time  component. 
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La  t  0  be  the  set  of  simple  options,  that  is,  0  =  A  X  E  X  T.  Let 

P  be  a  preference  relation  on  0;  the  structure  (0,P)  will  be  called 

"additive"  if  there  exist  functions  f  on  A,  g  on  E,  and  h  on  T  such 

that  for  all  o.  ,  o.  0,  o.  P  o .  ♦->  f(a.)  +  g(e.)  +  h(t.)  >  f(a.)  + 
i  j  i  j  i  l  i  J 

g(e.)  +  h(t.),  where  o,  =  (a  c  t.),  etc-  The  structure  (0,P) 

°  j  j  i  ill 

will  be  called  Luce-Tukey  (L-T)  additive  if  it  obeys  the  axioms  of 
Luce  and  Tukey  [18J  as  modified  by  Luce  [16].  (The  relevant  modifi¬ 
cation  extends  the  two  factor  results  of  L-T  to  any  finite  number  of 
factors  —  three  for  the  case  considered  here.) 

THEOREM.  For  simple  options  the  DEU  model  is  satisfied  if  and 
only  if  (0  ,P)  is  additive . 

PROOF.  This  proof  requires  only  minor  modification  from  that  of 
Theorem  1.3  in  Tversky  [33].  First  assume  (0,P)  is  additive.  Then 
there  exist  functions  f,  g,  and  h  such  that  (a,  e,  t)  P  (d' ,  e',  t') 
if  and  only  if  f(a)  +  g(e)  +  h(t)  •>  f(a')  +  g(e')  +  h(t').  Let  U(a) 
exp  (f(a)],  p(e)  =  exp  [g(e)],  and  d(t)  =  exp  [h ( t ) ] .  Clearly,  then, 
(a,  e,  t)  P  (a',  e',  t')  if  and  only  if  U(a)  p(e)  d(t)  ^  U(a')  p(e') 
d(t')  and  thus  the  DEU  model  is  satisfied.  Next  assume  the  DEU  model 
is  satisfied.  By  taking  logs  of  the  u,  p,  and  t  assumed  to  exist  it 
is  easy  to  show  the  existence  of  an  additive  representation,  which 
completes  the  proof. 

It  is  clear,  then,  that  the  L-T  axioms,  since  they  suffice  for 
additivity,  imply  the  validity  of  the  DEU  model  for  simple  options. 
What  the  axioms  assert,  very  loosely  speaking,  is  that:  (i)  P  is  a 
weak  order;  (ii)  that  given  a, a'  ,  e,e!,  and  t  there  exists  a  t'  such 
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that  (a,  e,  t)  is  indifferent  to  (a',  e',  t'),  and  similarly  the 
set  A  and  set  T;  (iii)  for  each  component  the  ordering  induced  on  the 
set  of  which  that  component  is  a  member  by  varying  that  component  is 
independent  of  the  values  at  which  the  other  two  components  are  held; 
and,  (iv)  there  is  a  rather  fine  structure  to  the  sets  A,  E,  and  T. 

On  the  surface  these  axioms  seem  rather  plausible,  though  if  (ii) 
is  to  be  accepted  events  regarded  as  impossible  must  be  excluded  from 
E.  (Alternately,  Luce  [16]  weakens  (ii)  in  a  way  such  that  this  sort 
of  restriction  on  E  would  be  unnecessary.)  In  addition  to  the  plausi¬ 
bility  of  the  axioms,  an  attractive  feature  of  the  model  is  its  empir¬ 
ical  testabili  y;  this  is  the  sort  of  model  I  plan  to  use  for  the  ex¬ 
periment  outlined  at  the  end  of  Part  Two/One. 

The  model  has  one  serious  drawback,  however,  that  Tversky  doesn’t 
seem  explicitly  aware  of.  The  drawback  is  that  p  need  not  be  proba¬ 
bility  measure  and  d  need  not  satisfy  certain  term  structure  properties 
required  for  a  discounting  function.  Additional  axioms  are  required 
to  get  these  results  and  in  the  next  subsection  of  this  Part  I  will 
try  to  indicate  (though  I  cannot  prove)  how  this  should  be  done. 

H .  SIMULTANEOUS  MEASUREMENT  OF  PROBABILITY  AND  TIME  PREFEPENCE 

As  in  the  preceding  paragraphs  I  shall  in  this  subsection  attempt 
to  use  the  additive  model  of  Luce  and  Tukey  as  a  basis  for  the  repre¬ 
sentation  desired.  The  basic  subject  matter  comprises  a  set  T  of 
points  in  time,  a  set  E  of  events,  and  a  relation  >  on  H  =  T*  X  E*, 
where  T*  and  E*  are  algebras  of  subsets  of  T  and  E.  The  set  H  is  the 
set  of  "happenings";  the  intuitive  notion  here  is  that  if  one  receives 
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a  prize  "on"  h  *  (t*,  e*) H  then  one  has  access  to  that  prize  (may 
use  the  prize)  during  all  r.e  t*  if  event  e*  occurs.  If  he  H  then  so 
is  ~h,  where  ~h  happens  if  ~t*  or  ~e*.  That  is,  ~h  happens  if  h  fails 
to  happen;  since  both  T*  and  E*  are  algebras,  then  he  H  implies  ~h  e  H 

Consider  now  two  prizes,  p  and  q,  with  p  really  preferred  to  q. 
Consider  also  two  happenings  h  =  (t*,  e*)  and  h'  =  ( t  *  * ,  e*')  and  let 
us  say  that  we  are  faced  with  choice  between  two  options.  In  option  1 

we  get  p  if  h  happens  or  q  if  ~h  happens;  in  option  2  we  get  p  if  h' 

happens,  q  if ~h' .  What  are  the  considerations  that  would  lead  us  to 
choose  option  1  over  option  2?  If  for  both  h  and  h'  we  had  access  to 
the  prize  at  the  same  time  (i.e.,  t*  =  t*')  clearly  we  would  prefer 
option  1  if  we  judged  e*  to  be  more  likely  than  e*'.  On  the  other 

hand,  if  e*  =  e*'  we  would  tend  to  prefer  option  one,  given  a  positive 

rate  of  time  preference,  if  t*  were  sooner  than  t*'  and  they  were  of 
about  equal  length,  etc.  In  sum,  we  would  judge  option  2  inferior  to 
option  1  if  h'  were  less  totally  discounted  than  h.  If  h'  is  less 
totally  discounted  than  h,  I  will  denote  this  by  h'  <  h. 

(I  am  choosing  to  take  <  as  a  primitive  relation  here.  It  would 
be  possible,  in  the  manner  of  Savage  [28],  to  include  the  set  of  prizes 
in  the  basic  subject  matter  of  the  theory  and  have  the  primitive  re¬ 
lation  be  that  of  preference  among  acts,  If  that  were  done,  an  axiom 
would  be  required  to  assure  that,  in  the  language  of  my  previous  dis¬ 
cussion,  if  option  1  were  preferred  to  option  2  for  any  p  and  q  (with 
p  definitely  preferred  to  q),  option  1  would  be  preferred  tr«  option  2 
for  all  p'  and  q'  if  p'  were  preferred  to  q'.  A  theory  including  the 
set  of  prizes  would  not  really  be  more  general  than  che  one  I  am  dis¬ 
cussing.  The  reason  is  that  once  discount  weights  have  been  assigned 
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to  each  h  e  H,  these  weights  can  be  used  to  calibrate  cardinal  util¬ 
ities  in  the  manner  of  von  Neumann  and  Morgenstem  [35] .  This  is 
essentially  what  Savage  does  anyway.) 

My  basic  intention  here  is  to  place  axioms  on  the  structure  (H, 

T*,  E*,  <)  that  will  do  the  following:  (i)  guarantee  the  existence 
of  a  probability  measure  p  on  E*,  (ii)  guarantee  the  existence  of  a 
discounting  function  d  on  T*,  and  (iii)  for  h  =  (t*,  e*)  and  h'  = 

(t*'  ,  e*’)  e  H,  have  h  <  h'  if  and  only  if  d(t*)  p(e*)  r  d(t*')  p(e*'). 
I  cannot  at  present  state  axioms  from  which  I  can  prove  the  desired 
representation.  However,  my  conjecture  is  that  the  following  general 
sf-ategy  will  suffice. 

First,  apply  Luce's  [16]  modification  of  the  L-T  system  to  the 
structure  (H,  T*,  E*,  <) .  This  modification  will  allow  there  to  exist 
elements  that  cannot  be  compensated,  for  example,  the  probability  of 
the  null  event.  Fvom  these  axioms  it  is  clear  that  functions  f  and  g 
on  T*  and  E*  exist  that  satisfy  property  (iii)  in  the  pa  agraph  above. 
Also,  it  is  clear  that  there  exist  weak  orders  on  T*  and  E*  that  corre¬ 
spond  to  the  notions  of  "more  discounted  than"  and  "more  probable 
than".  We  can  add  new  axioms  for  these  weak  orders  to  obtain  the  re¬ 
quired  probability  and  discount  measures,  p  and  d.  (An  attractive  set 
of  axioms  are  those  of  Luce  [25];  the  same  axioms  will  serve  for  both 
p  and  d  becausa  of  the  formal  similarity  between  probability  and  dis¬ 
count  measures  that  was  pointed  out  in  Part  Two/Two.) 

The  basic  remaining  formal  problem  is  this.  The  functions  f  and  g 
satisfying  the  additive  conjoint  measurement  are  clearly  monotonically 
consistent  with  the  functions  p  and  d,  since  they  represent  the  same 
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underlying  weak  order.  However,  p  and  d  are  unique.  The  question 
then  is:  do  there  exist  f'  and  g'  satisfying  the  conjoint  axiomat- 
ization  such  that  f'  »  p  and  g'  =  d?  It  seems  intuitively  clear  to 
me  that  the  answer  here  is  "yes",  for  the  following  reason.  Interpret 
T  as  well  as  E  as  a  set  of  random  events  and  have  the  members  of  T  be 
probabilistically  independent  of  E.  Then  the  set  H  is  the  set  of  joint 
events  and  clearly  the  ordering  of  the  probabilities  of  the  joint 
events  will  be  consistent  with  the  ordering  induced  by  the  product  of 
the  probabilities  of  the  component  events.  Thus  I  do  feel  that  I  will 
be  able  to  eventually  prove  the  conjecture  with  which  i  close  Section 
Two. 
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Section  Three 

INFORMATION  AND  CHOICE 

Uncertain  events  generally  determine  the  outcome  of  a  decision¬ 
maker's  choice;  this  indeterminateness  introduces  a  need  for  modifi¬ 
cation  of  a  number  of  formulations  of  classical  economic  theory.  This 
reformulation  may  be  of  a  rather  simple  technical  character — Debreu  [22], 
for  example,  simply  redefines  a  commodity  to  include  the  event  upon 
which  its  transfer  is  conditional.  All  the  theorems  concerning  eco¬ 
nomic  equilibrium  in  a  certain  world  apply  directly  to  this  newly  de¬ 
fined  world  in  which  all  uncertainty  is  accounted  for.  The  reason 
this  approach  seems  so  intuitively  unsatisfactory  is,  I  feel,  due  to 
its  failure  to  systematically  consider  information  as  a  commodity. 

Arrow  [4]  has  reviewed  a  number  of  studies  of  how  treating  information 
as  a  commodity  affects  economic  theory  and  I  would  cast  some  of  the 
questions  raised  in  the  following  form: 

1.  How  can  we  quantify  information? 

2.  What  are  characteristics  of  information  as  a  commodity  that 
set  it  apart  from  other  commodities?  To  what  extent  do  these 
characteristics  raise  difficulties  for  economic  theory? 

3.  How  is  information  optimally  used? 

4.  How  is  information  actually  used? 

Section  Three  of  this  dissertation  is  primarily  concerned  with 
questions  3  and  4,  though  there  are  also  some  comments  on  1 .  In  Part 
Three/One  I  examine  aspects  of  the  normative  problem  posed  by  question 
3  and  in  Part  Three/Two  I  examine  and  develop  a  number  of  descriptive 
theories  of  information  usage,  or  theories  of  learning. 
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Normative  Theories  of  Information  Usage.  Arrow  [3,  p.  13]  has 
stressed  that  "the  influence  of  experience  on  beliefs  is  of  the  utmost 
importance  for  a  rational  theory  of  behavior  under  uncertainty,  and 
failure  to  account  for  it  must  be  taken  as  a  string  objection  to  theories 
such  as  Shackle's."  In  the  paragraph  preceding  this  comment  Pror.  Arrow 
Implicitly  indicates  that  this  rational  theory  would,  in  his  view,  con¬ 
sist  essentially  of  consistent  utilization  of  Bayes'  theorem.  This  is 
a  view  vigorously  denied  by  some  philosophers,  for  example  Fatrick 
Suppes  [65],  who  contends  that  concept  formation  or  insightful  inference 
is  in  some  St  nse  rational  and  cannot  be  accounted  for  in  terms  of  Baves' 
theorem.  (I  should  note  that  the  Bayes'  theorem  view  is  also  supported 
by  a  number  of  philosophers,  most  prominently  Frof.  Carnap  [17,  18],  and 
that  in  most  respects  the  views  of  Suppes  are  rather  close  to  Carnap's 
on  these  matters.)  This  issue  of  the  sufficiency  of  Baves'  theorem  for 
a  rational  account  of  belief  change  seems  to  me  to  raise  two  questions: 

1.  What  conceptual  alternative  is  there  to  Baves'  theorem? 

2.  To  what  extent  can  clever  use  of  Baves'  theorem  account  for 
'rational'  seeming  concept  learning  behavior’ 

I  know  of  no  positive  answer  to  question  1.  One  of  the  major  purposes 
of  Part  Three/One  :.3  to  provide  a  partial  answer  to  qc-  tion  2,  that  is 
to  show  that  Bayes'  theorem  may  well  be  applicable  in  certain  concept 
learning  tasks.  I  feei  that  Baves’  theorem  is  not  the  end  of  a  thcorv 
of  rational  information  usage  but  rather  its  beginning.  The  issues  to 


pursue  are  how  does  one  character i ze  the  event  space  in  such  a  way  that 
any  structure  it  mav  have  becomes  apparent  and  how  do.  s  one  assign  prior 
probabilites  over  that  space;  the  results  in  Part  Throe/One  denend  on 
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doing  this  in  specific  ways.  (The  assertion  that  assignment  of  priors 
is  a  valid  aspect  of  a  theory  of  rational  choice  is,  incidentally,  the 
primary  distinguishing  feature  between  adherents  of  'logical'  and 
' personalis  tic '  theories  of  probability — see  Carnap  [19].) 

I  must  say  that  I  see  no  way  at  present  of  integrating  the  material 
of  Part  Three/One  into  the  mainstream  of  economic  theory.  As  an  obvi¬ 
ously  essential  aspect  of  the  theory  of  individual  choice  behavior  it 
stands  on  its  own  as  a  component  of  microeconomic  theory.  The  question 
remains,  however,  of  whether  this  approach  will  prove  suggestive  in 
addressing  any  larger  economic  issues  such  as,  for  example,  determinants 
of  investment  in  research  and  development  or  dissemination  of  new  tech¬ 
nique  . 

Descriptive  Theories  of  Information  Usage.  Since  the  early  1950s 
mathematically  formulated  theories  of  information  usage  (or  learning) 
have  played  an  increasingly  important  rcle  in  psychology.  In  1958 
Prof.  Arrow  [2,  p.  13]  predicted  that  these  theories  would  have  a 
major  influence  in  economics:  "Learning  is  certainly  one  of  the  most 
important  forms  of  behavior  under  uncertainty.  In  this  field,  recent 
work  is  giving  rise  to  results  which  may  have  very  striking  impact  on 
economic  thought."  I  think  it  fair  to  say  that  this  prediction  has 
not  yet  been  borne  out.  There  seem  to  me  to  be  three  major  reasons 
for  this: 

First,  in  attempts  to  provide  empirically  adequate  theories,  psy¬ 
chological  theorists  have  introduced  a  complexity  into  their  choice 
models  that  renders  them  difficult  to  integrate  into  more  aggregate 
theories.  Luce  and  Suppes  [41A,  p.  253]  stress  this  point:  "While 
being  elaborated  as  distinct  and  testable  psychological  theories,  the 


( >s  of  orefprci  re  [  Including  lei.  ning  j  'save  ho  Rita  tn  a-'quiti  a 
richness  and  complexity — hopeful ly  re! lectmg  a  true  richness  and  com¬ 
plexity  of  behavior — that  renders  them  largely  useless  as  bases  tor 
economic  and  statistical  theories.  Perhaps  we  may  ultimately  find 
simple,  yet  reasonably  accurate,  approximations  to  the  more  exact  de¬ 
scriptions  of  behavior  that  can  serve  as  psychological  foundations  for 
other  theoretical  developments,  but  at  the  moment  this  is  not  the  main 
trend . " 

Second,  since  detailed  theories  of  learning  and  choice  are  most  cen¬ 
trally  the  concern  of  the  psychologist,  economists  have  probably  felt 
little  need  to  do  active  research  in  this  area.  This  contrasts  sharply 
with  detailed  studies  of  firm  behavior;  though  such  studies  are  natural 
analogs  of  detailed  study  of  individual  choice  behavior,  there  is  no 
other  discipline  specifically  concerned  with  those  problems.  Thus  the 
study  of  firm  behavior  is  a  more  natural  focus  for  economic  research. 

Third,  theories  of  learning  have  generally  been  constructed  only 
for  highly  artificial  tasks  with  information,  structures  of  an  unusually 
unrealistic  sort.  it  is  primarily  for  this  last  sort  of  reason,  I  feel, 
that  learning  theory  ha9  had  almost  as  little  serious  application  in 
education  as  it  has  in  economics. 

The  primary  purpose  of  Part  Three/Two  is  related  to  lessening  the 
thrust  of  the  third  comment  above.  In  that  part  a  variety  of  new  theo¬ 
retical  models  are  presented  to  account  for  situations  dealt  with  in 
previous  work  in  learning  theory.  Then  the  class  of  situations  con¬ 
sidered  is  broadened  to  include  analysis  of  situations  where  there  is 
only  incomplete  information  of  various  types  in  the  reinforcement  set. 
This  sort  of  incomplete  information  is  much  more  typical  of  economic 


situations  >n  both  consumption  and  production  than  i  the  complete  in¬ 
formation  case.  Nevertheless,  even  the  models  treated  here  can  only 
be  considered  rather  abstract  idealizations  of  real  life  behavior. 

One  possible  source  of  data  for  testing  t..ese  models  in  a  more 
realistic  environment  ..tight  come  from  the  pa  tially  computer  based 
microeconomic  theory  course  that  Martin  Shubik  and  R.  Levitan  are  de¬ 
veloping.  Included  in  this  course  will  be  20  exercises  (of  about  an 
hour's  length)  at  a  computer  based  teletype.  The  student  will  be  tsked 
to  take  the  role  of,  say,  a  monopolist  and  will  be  forced  to  make  the 
sort  of  price,  quantity,  advertising,  etc.,  decisions  that  a  monopolist 
must  make.  The  student  will  make  a  series  of  decisions  receiving  along 
the  way  information  concerning  the  consequences  of  his  previous  de¬ 
cisions.  Prof.  Shubik  told  me  that  one  of  his  purposes  in  constructing 
this  course  is  to  obtain  detailed  empirical  information  concerning  in¬ 
dividual  choice  behavior  where  the.  individual  is  acting  as  represen¬ 
tative  of  a  firm.  Certain  of  the  models  developed  in  Part  Three/Two 
may  be  of  use  in  analyzing  this  data,  particularly  those  models  assuming 
a  continuum  of  response  alternatives. 

f^t  me  end  th-  '•  *  ;rtorv  ~ ommen t -  •  this  Section  hv  sup' 

the  possibility  that  there  may  in  the  future  develop  a  theory  of  general 
economic  equilibrium  based  on  descriptive  stochastic  models  rather  than, 
as  at  present,  on  normative  deterministic  ones.  The  elements  that  need 
to  be  integrated  in  a  systemtic  way  are:  (i)  stochastic  theories  of 
preference  and  learning,  (li)  stochastic  theories  of  the  firm,  such  as 
that  pioneered  by  Newman  and  Wolfe  1 4 7 A ] ,  and  (iii)  stochastic  theories 
of  market  adjustment  such  as  I  am  now  working  on — Jamison  [36], 
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Part  Three/One 


INFORMATION  AND  INDUCTION:  A  SUBJECTIVISTIC 
VIEW  OF  SOME  RECENT  RESULTS* 


I.  INTRODUCTION 


We  might  distinguish  between  inductive  and  deductive  inferences 
in  the  following  way:  Deductive  inferences  refer  to  the  implications 
of  coherence  for  a  given  set  of  beliefs,  whereas  inductive  inferences 
follow  from  conditions  for  'rational'  change  in  belief.  Change  in 
belief,  I  shall  argue  in  the  subsection  II,  is  perhaps  the  most  philo¬ 
sophically  relevant  notion  of  semantic  information.  Thus  rules  govern¬ 
ing  inductive  inferences  may  be  regarded  as  rules  for  the  acquisition 
of  semantic  information. 

I  have  four  purposes  in  this  part.  First  I  shall  attempt  to  pro¬ 
vide  a  definition  of  semantic  information  that  is  adequate  from  a  sub¬ 
jectivist  point  of  view  and  that  is  based  on  the  concept  of  information 
as  change  in  belief.  From  this  I  shall  turn  to  a  subjectivistic  theory 
of  induction;  the  second  purpose  of  this  work  is  to  suggest  a  solution 
to  the  inductive  problem  that  Suppes  [62,  pp.  5.14  -  515]  points  out  to 
lie  at  the  foundations  of  a  subjectivistic  theorv  of  decision.  (By  this 

•k 

Footnotes  in  this  part  are  numbered  consecutively  and  appear  at 
the  end  of  the  p<*i.t. 


I  do  no!  mean  Lo  suggest  a  Solution  to  tlie  inductive  problem  of  Hume; 

L  would  a  ;ree  with  Savage  [57]  that  the  subjective  theory  of  probabil¬ 
ity  simply  cannot  do  this.)  The  third  thing  I  wish  to  do  is  to  sh'1'-' 
how  Carnap's  continuum  of  inductive  methods  may  be  easily  interpreted 
as  a  special  case  of  trie  subjectivistic  theory  of  induction  to  be  pre¬ 
sented.  Finally,  I  provide  a  subjectivistic  interpretation  of  Hintikka's 
two  dimensional  inductive  continuum,  and  show  how  this  is  related  to  the 
problem  of  concept  formation. 
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! I .  SEMANTIC  INFORMATION  AND  INDUCTION 

'Two  Notions  of  Semantic  Information 

Two  alternative  notions  of  semantic  information  arc  reduc  t ie n  in 
uncertainty  and  change  in  belief.  Redaction  in  uncertainty  is,  clearly, 
a  special  case  of  change  in  belief.  Information  is  defined  in  terms  of 
probabilities;  hence,  one's  view  of  the  nature  of  probability  is  inev¬ 
itably  an  input  to  his  theory  of  information.  As  there  are  tb-.v.  prom¬ 
inent  views  concerning  the  nature  of  probabil ity--thc  relative  frequency, 
logical,  and  subjectivist  views--and  there  are  the  two  concept-,  of  infor¬ 
mation  just  mentioned,  we  can  distinguish  six  alternative  theories  of 
information.  Table  1  arrays  these  theories. 

Table  1  Theories  cf  Information 


concept  of 
Informat  ion 


Concept  of  Probability 

Relative  Frequency _ Logical  Subjective 


Change  in 

Belief  CR  CL  CS 


Reduction  of 

Uncertainty  RR  RL  RS 


RS,  for  example,  would  be  a  theory  of  information  based  on  a  sub¬ 
jectivist  view  of  probability  and  a  reduction  of  uncertainty  approach 
to  Information.  The  development  of  the  RR  theory  by  Snannon  [58]  has 
provided  the  formal  basis  for  most  later  work.  Carnap  and  Bar-Hillel  [20] 
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developed  RL  and  Bar-Hillel  [8,9]  hints  at  the  potential  value  of 
developing  what  I  would  call  RS  or  CS,  though  his  precise  meaning  is 
unclear.  Sneed's  [6.1]  discussion  ox  "pragmatic  informativeness"  is 
related.  Smokier  [59]  as  well  as  Hintikka  and  Pietarinen  [27]  have 
further  developed  RL. 

An  undesirable  feature  of  RL  is  that  in  it  logical  truths  carry 
no  information.  For  example,  solving  (or  being  told  the  solution  of) 
a  difficult  differential  equation  gives  you  no  new  information.  This 
is  a  result  oi  accepting  the  "equivalence  condition,"  ramifications 
of  which  are  discussed  by  Smokier  [60].  R.  Wells  [72]  has  made  an 
important  contribution  to  the  development  of  RS  by  beginning  a  theory 
of  the  Information  content  of  a  priori  truths.  To  continue  the  ex¬ 
ample  above,  V»-lls  allows  that  the  solution  to  the  differential  equa¬ 
tion  may,  indeed,  give  information.  R.  A.  Howard's  [90]  iper  on 
"information  value  theory"  develops  RS  in  a  decis ion-theoretic  context, 
deriving  the  value  of  clairvoyance  and  using  that  value  as  the  upper 
bound  to  the  value  of  any  information.  McCarthy  [43]  has  also  devel¬ 
oped  a  class  of  measures  of  the  value  of  RS  information. 

Two  further  works  concerning  semantic  information  and  change  in 
belief  should  be  noted.  MacKay  [42]  has  developed  techniques  of  in¬ 
formation  theory  to  analyze  scientific  measurement  and  observation. 

His  view  :  .y  be  considered  a  change  in  belief  view.  In  a  more  recent 
work  brnest  Adams  fll  has  deve!c?-d  a  theory  of  measurement  in  which 
Information  theoretic  considerations  play  an  important  role.  It  seems 
to  me  that  one  interpretation  of  his  approach  would  be  that  the  purpose 
of  measurement  is  simply  the  attainment  of  semantic  information,  though 
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nuaras  would  not  aglet  with  this.  Throughout  Ad#ms  uses  a  frequency 
interpretation  of  probability. 

Initiating  a  CS_  Theory  of  Information 

What  seems  to  me  to  be  the  most  natural  notion  of  semantic  infor¬ 
mation  Is  change  in  belief  as  reflected  in  change  in  subjective  pro¬ 
babilities.  That  is,  I  would  regard  CS  as  the  most  fundamental  entry 
in  the  table  shown  above,  at  least  from  a  psychologist's  or  philosopher's 
point  of  view.  There  are  two  primary  reasons  for  this.  The  first  is 
that  change  in  belief  is  a  more  general  notion  than  reduction  of  un¬ 
certainty,  subsuming  reduction  in  uncertainty  as  a  special  case.  The 
second  is  that  reality  is  far  too  rich  and  varied  to  be  adequately 
reflected  in  a  logical  or  relative  frequency  theory  of  probability. 

Let  me  now  turn  to  definitions  of  belief  and  information. 

Consider  a  situation  in  which  there  are  m  mutually  exclusive  and 
collectively  exhaustive  possible  states  of  nature.  Define  an  m-1  dimen¬ 
sioned  simplex,  ,  in  m  dimensioned  space  in  the  following  manner: 

-  ■  {  *  I  T  F.  ■  1  and  f  >  0  for  1  <  i  <  m  i  .  The  vector 
_  '  i-1  ' 

p  ■  (c,,  c_,  ...,  B  )  intuitively  corresponds  to  a  probability  distri- 
I  l  m 

button  over  the  states  of  nat...  f  probability  of  the  ith 

• 

state  of  nature.  Z  is  the  set  of  all  possible  probability  distribu¬ 
tions  over  the  m  states  of  nature.  For  these  purposes  a  belief  may  be 
simple  defined  as  a  subjectively  ieL  vector  f  .  Measurement  of  be¬ 
lief  is  an  example  of  "fundamental"  measurement  and  the  conditions 
under  which  auch  measurement  Is  posable  are  simply  the  conditions  that 
mu:t  obtain  In  order  that  a  qualitative  probability  relation  on  a  set 
may  be  represented  by  a  numerical  measure.  Information  is  an  example 
of  "derived"  measurement. 
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Rcby  [55]  has  an  Interesting  discussion  of  belief  si.ai.es  defined 

-4 

in  this  way.  Let  F  be  a  person's  beliefs  before  he  receives  some 

information  (or  message)  M,  and  F1  his  beliefs  afterwards.  The 

notion  of  message  here  is  to  be  interpreted  very  broadly  —  it  may  be 

the  result  of  reading,  conversation,  observation,  experimentation,  or 

simply  reflection.  The  primary  requirement  of  a  definition  of  the 

amount  of  information  in  the  message  M,  inf(M),  is  that  it  be  a 

— *  — ♦ 

(strictly)  increasing  function  of  the  "distance"  between  F  and  f'. 
Perhaps  the  simplest  definition  that  satisfies  this  requirement  is: 


m 

inf(M)  -  | f  -  5' |  -  •  fJ>2  •  <l> 

i-1 

A  drawback  to  this  definition  is  that  the  amount  of  inforraatirn 
is  relatively  insensitive  to  m.  Consider  two  cases  where  in  the  first 
m  ■  A  and  in  the  second  m  »  40.  In  each  F^  ■  1/m  for  1  <  i  <  m 
and  •  1  and  •  0  for  i  >  l.  It  would  seem  that  in  some  sense 
in  the  case  where  ra  equaled  40  a  person  would  have  received  much  more 
information  than  if  m  had  equaled  '  ’nd  the  Shannon  measure  of  informa¬ 
tion,  j  i. ample,  reflects  this  intuition.  However,  for  r  «  4,  the 
information  received  as  defined  in  (1)  is  .876,  and  for  m  •  40  it  is 
.989,  a  rather  small  difference.  An  alternative  definition,  that  takes 
care  of  this  defect,  is: 


inf(H) 


m/alm-1) 

2(o-l) 


(2) 


The  apparent  complexity  makes  some  numbers  come  out  nicely;  from  the 


preceeding  example,  when  m  **  4  the  information  conveyed  as  measured 
by  (2)  Is  2.  For  m  ■  40,  it  is  20. 

The  definitions  in  equations  (11  and  (2;  are  meant  merely  to  show 
that  a  CS  theory  of  information  can  be  discussed  in  a  clear  and  formal 
way.  Implications  of  these  definitions  --  or  alternatives  to  them  -- 
must  await  another  time,  as  the  rest  or  this  pap ^ r  will  be  concerned 
primarily  with  induction. 

Semantic  Informat  ion  and  Induction 

For  purposes  of  discussing  induction  we  might  consider  three 
levels  of  i..ductive  inference.  The  first  and  simplest  level  i3  simply 
cond i ti onal ’ r at  ion  or  the  updating  of  subjective  probabilities  by 
means  of  Bayes'  theorem.  That  this  is  the  normatively  proper  way  lo 
proceed  in  some  instances  seems  undeniable.  A  more  complicated  level 
of  inductive  inference  concerns  inferences  made  on  the  basis  of  the 
formation  of  a  concept.  The  highest  level  of  inductive  inferences  are 
inductions  made  from  scientific  laws,  by  which  I  simply  mean  mathema¬ 
tical  models  of  natural  phenomena.  The  distinction  between  the  second 
and  third  levels  of  inference  is  that  models  have  parameters  to  be 
evaluated  whereas  concepts  do  not. 

A  question  of  son,"  interest  concerning  philosophical  theories  of 
induction  is  whether  some  form  of  Bayesian  updating  will  suffice  for  a 
normative  account  <?  inductive  behavior  at  the  second  and  third  levels. 
Suppcs  [bb]  answers  the  question  just  asked  with  a  clear  "no."  Hl 
summarises  his  position  in  the  following  way: 

"The  core  of  the  problem  is  developing  an  adequate 
psychological  theory  to  describe,  malvre,  and  predict  the 
structure  imposed  by  organisms  on  tie  bewildering  complexities 
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of  possible  alternatives  ci-’g  their..  I  hope  I  have  made 
it  clear  that  the  simple  i.ncept  of  an  a  priori  distribu¬ 
tion  over  these  alternatives  is  by  no  means  sufficient  and 
does  little  toward  offering  a  solution  to  any  complex 
problem." 

Suppes  even  suggests  t ha t  in  cases  -where  Bayes'  theorem  would  fairly 
obviously  be  applicable  a  person  night  not  be  irrational  to  act  in  some 
other  way.  While  I  cannot  the  rationale  for  this,  the  points  he 
makes  about  concept  formation  and,  implicitly,  about  the  construction 
of  scientific  lavs  seem  well  taken.  To  put  this  in  the  context  of  our 
discussion  of  semantic  information  I  would  suggest  that  a  concept  had 
been  formed  when  a  person  acquires  much  semantic  information  (i.e., 
radically  rearranges  his  beliefs)  on  the  basis  of  small  evidence. 

In  the  following  two  sections  of  this  paper  I  deal  with  inductive 
inference  of  the  simplest  sort.  In  the  final  section  of  the  paper  I 
attempt  to  show  tha  Suppes'  pesaimiam  concerning  a  Bayesian  theory  of 
concept  formation  is  partially  unjustified. 


1 
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I 

III.  A  SUBJECTIVISTIC  THEORY  OF  INDUCTION 

i 

Ky  'iscuasion  of  induction  will  be  formulated  in  a  decialon- 
i  theoretic  framework,  and  I  will  digress  to  problems  of  decision  theory 

i 

! 

here  and  there.  The  discussion  of  decisions  under  total  ignorance 
forms  the  basis  for  the  later  discussion  of  inductive  inference,  and 
the  intuitive  concepts  of  that  subsection  should  be  understood,  though 
the  mathematical  details  are  not  of  major  importance. 

All  essentials  of  a  subjectivistic  theory  of  induction  are  con¬ 
tained  in  Bruno  de  Finettl's  [23]  classic  paper.  The  probability  of 
probabilities  approach  developed  here  can  be  translated  (though  not 
always  airaply)  into  the  de  Finetti  framework;  the  only  real  justifi¬ 
cation  for  using  probabilities  of  probabilities  la  their  conceptual 
simplicity.  The  importance  of  this  simplicity  will,  I  think,  be  illus¬ 
trated  in  Sections  IV  and  V. 

A  triole  P  ■  <  D,  0,  U  >  may  be  considered  a  finite  decision 
problem  when:  (1)  D  is  a  finite  set  of  alternative  courses  of  action 
avail  -ble  to  a  decision-maker,  (ii)  £1  is  a  finite  set  of  mutually 
exclusive  and  exhaustive  possible  states  of  nature,  and  (iii)  U  is  a 
function  on  such  that  u(di>  is  the  utility  to  the  decision¬ 

maker  if  he  chooses  and  the  true  state  of  nature  turns  out  to  be 
(jOj.  A  decision  procedure  (solution)  for  the  problem  P  consists  either 
of  an  ordering  of  the  d^s  ".cording  to  their  desirability  or  of  the 
specification  of  a  subset  of  D  that  contains  all  that  ars  in  some 
sense  optimal  and  only  those  that  are  optimal. 
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If  there  are  n  states  of  nature,  a  vector  F  «  )  is  a 

1  TU 

possible  probability  distribution  over  r  (with  prob(uu.)  ■  £.)  if  and 

UJ  J  J 

only  if  Cj  »  1  and  r,  2  0  tor  1  £  j  £  m.  The  aet  of  all  possible 

probability  distribut ions  over  C.  ,  that  is,  the  set  of  all  vectors 
whose  components  satisfy  the  above  equation  and  set  of  inequalities 
will,  as  in  the  preceeding  section,  be  denoted  by  K  .  Atkinson,  Church 
and  Harris  [3]  assume  our  knowledge  of  r  to  be  completely  specified  by 
asserting  that  £  e  H^,  where  H  c  H  .  If  Hq  *  H,  they  say  we  are  in 
complete  ignorance  of  $  .  In  the  manner  of  Chemoff  [21]  and  Milnor  [46] 
Atkinson,  et  jl,  *e  axioms  stating  desirable  properties  for  decision 
procedures  under  complete  ignorance.  A  class  of  decision  procedures 
that  isolates  an  optimal  subset  of  D  is  shown  to  exist  and  satisfy  the 
axioms.  Theae  procedures  are  non-Bayesian  in  the  sense  that  the' cri¬ 
terion  for  optimality  ia  not  maximization  of  expected  utility.  Other 
non-Bayesian  procedures  for  complete  ignorance  (that  fail  to  satisfy 
some  sxioms  that  most  people  would  consider  reasonsble)  include  the 
following:  rainimax  regret,  minimax  risk  (or  maxlmin  utility),  and 
Hurwics's  a  procedure  for  extending  the  rainimax  risk  approach  to  non- 
pes8lmi8ts* 

The  Bayesian  alternative  to  the  above  procedures  attempts  to 
order  the  according  to  their  expected  utility;  t  >  optimal  act  is, 
then,  simply  the  one  with  the  highest  expected  *ility.  Computation 
of  the  expected  utility  of  d^,  Eu(d^),  is  straightforward  if  the 

decision-maker  knows  that  H  is  a  set  with  but  one  element  --  F  *: 

o 

a  * 

E  u(d  )  •  V  u(d,,  u>.)  .  Only  in  the  rare  Instances  when  con- 

i  1-1  j  ^ 

sidersble  relative  frequency  data  exist  will  the  decision-maker  be 
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ablt  to  assert  that  “  has  only  one  element.  In  the  more  general  cage 
the  decision-maker  will  be  in  "partial"  or  "total"  ignorance  concern¬ 
ing  the  probability  vector  c  .  It  is  the  purpose  of  the  next  two  sub¬ 
sections  to  characterize  total  and  partial  ignorance  from  a  Bayesian 
point  of  view  to  show  that  decision  procedures  based  on  maximiza¬ 
tion  of  expected  utility  extend  readily  to  these  cases. 


Decisions  Under  Total  Ignorance 

Rather  than  saying  that  our  knowledge  of  the  probability  vector 

.  —4 

*  is  specified  by  asserting  that  f  e  ~q  for  some  K  ,  I  suggest  that  it 
is  natural  to  say  that  our  knowledge  of  F  is  specified  by  a  density, 

f(f^, . . . ,£  ) ,  defined  on  ”.  If  the  probability  distribution  over 

-*  *  -*  * 

is  known  to  be  "  ,  then  f  is  a  s  function  at  «;  and  computation  of 
Eu(d^)  proceeds  as  in  the  introduction.  At  the  other  extreme  from 
precisely  knowing  the  probability  distribution  over  ~  is  the  case  of 
total  ignorance.  In  this  sub- section  a  meaning  for  total  ignorance 
of  F  will  be  discussed.  In  the  following  subsection  decisions  under 

—4 

partial  ignorance  —  anywhere  between  knowledge  of  F  and  total  ignor¬ 
ance  --  will  be  discussed. 

If  Hf8)  is  the  Shannon  [58]  measure  of  uncertainty  concerning 
which  u)  in  ^  occurs,  then  H  (f)  *  logjQ/1^),  where  HC18)  is 

measured  in  bits.  When  this  uncertainty  is  a  maximum,  we  may  be 
considered  in  total  ignorance  of  <r  and,  as  one  would  expect,  this 
occurs  when  we  have  no  reason  to  expect  any  one  uu  more  than  another, 


i.e.,  when  for  all  i,  ^  ■  1/m. 

•*4 

total  ignorance  of  f  when  H(f) 


I 
I 
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maxicnura.  This  occurs  when  f  is  a  constant,  that  is,  when  we  have  no 

-4 

reason  to  expect  any  particular  value  of  *  to  be  more  probable  than 
any  other  (see  Chap.  3  of  Shannon).  If  there  is  total  Ignorance  con¬ 
cerning  then  it  is  reasonable  to  expect  that  there  is  total  ignor¬ 
ance  concerning  u;  --  and  this  is  indeed  true  (if  we  substitute  the 
expectation  of  E(g  )f  for  me  now  Prove  this  last  asser¬ 

tion,  which  is  the  major  result  of  this  sub-section.  While  this  could 
be  proved  using  the  rather  general  theorems  to  be  utilized  in  ray  dis¬ 
cussion  of  Carnap,  I  think  it  is  intuitively  useful  to  go  into  a  little 
more  detail  here. 

Proving  that  under  total  ignorance  E(£.)  ■  1/m  involves,  first, 
determination  of  the  appropriate  constant  value  of  f,  then  determination 
of  the  marginal  density  functions  for  the  and,  finally,  integration 
to  find  E(fi). 

Let  the  constant  value  of  f  equal  K;  since  f  is  a  dennity  the 
integral  of  K  over  H  must  be  unity: 

J I  -fl K  ■  *•  <3) 

where  d~  ■  dF^. . Our  first  task  is  to  solve  this  equation  for  K. 

Since  f  is  defined  only  on  a  section  of  a  hyperplane  in  in  dimensioned 
space,  the  above  integral  is  a  many  dimensioned  'surface'  integral. 

Figure  1  depicts  the  three  dimensional  case.  As  l  "l  ■  *'  l* 
determined  given  the  previous  m-1  F^s  and  the  integration  need  only  be 


Insert  Figure  1  About  Here 
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over  a  region  of.  m«l  dimensioned  space,  the  region  A  in  figure  1.  I£, 

is  shown  in  advanced  calculus  that  dH  and  dA  are  related  in  the  follow 
ing  ways 


where  xt  is  the  function  of  ^ .  'm-1  that  gives  the  ith  component 

— * 

of  f  ,  that  is  x^(.)  "  ^  for  i  lesa  than  or  equal  to  m-1  and  x^(.)  « 

1  -  f,  if  1  ®  os.  It  can  be  shown  that  each  of  the  m 

quantities  that  are  squared  under  the  radical  above  ia  equal  to  either 
plus  or  minus  one;  thus  dH  *  /ST  dA  .  Therefore  (3)  may  be  rewritten 


as  follows; 


J  j*  . . .  |*  K  ,/m  dA  *  1 ,  or 


j  i  ***  dl^idr^z* ••d?i  "  l/K  / «  •  (4) 


The  multiple  integral  in  (4)  could  conceivably  be  evaluated  by 
Iterated  integration;  it  is  much  simpler,  however,  to  utilize  a  tech¬ 
nique  devised  by  Dirichlet.  Recall  that  the  gamma  function  is  defined 

in  the  following  way;  F(n)  »  J  «"  *e’xdx  for  n  i  0.  If  n  is  a 

0 

positive  integer,  F(n)  ■  <n« 1) !  and  05  *  1.  Dirichlet  showed  the 
following  (see  Jeffreys  and  Jeffreys  {39],  pp.  468-470):  If  A  is  the 


•  •  • 


1,  then 
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ff  f  V1  V1  v1 

x!  x2  *‘*xn  M 


J2 

C1  c2  • 
P1°2P3* 


a 

n 


n 


••P 


n 


(5) 


For  our  purposes,  c.  -  p£  -  ^  -  1,  for  1  s  i  s  m  and  the  m.1  F 

replace  the  n  xs.  The  result  Is  that  the  integral  in  (4)  becomes 
l/r(m)  -  1/ (m-1) ! .  Therefore  K  -  (m-l)J  /m/m. 

Having  determined  the  constant  value,  K,  of  f  we  must  next  deter¬ 
mine  the  densities  f^)  for  the  individual  probabilities.  By  sym¬ 
metry,  the  densities  must  be  the  same  for  each  The  densities  are 
the  derivatives  of  the  distribution  functions  which  will  be  denoted 
W-  Fl(c)  Sives  the  probability  that  *  is  less  than  c;  denote 
by  F^c)  the  probability  that  ^  ;>  c,  that  is,  P^c)  -  1  -  F*(c)  is 
•imply  the  integral  of  f  over  where  ^  is  the  subset  of  E 

including  all  points  such  that  ^  *  c.  See  Fig.  2.  F*(c)  is  given  by: 


Insert  Figure  2  About  Here 


F*(c)  ■  JJ  •••  J  «'>■>  j'j'  ...  J * 


m  dA 


(6) 


Since  K  -  (ra-l)!  -Wm,  (6)  becomes  (after  inserting  the  limits  of 
integration): 


F.(C)  -  (m-l)!  f  f  :  r  1  ™-2dr 

1  i  Jo  Jo  m’l 


d'm-2--dri  * 


(7) 
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A  translation  of  the  f^  axis  will  enable  us  to  use  Dirichlet  integra¬ 
tion  to  evaluate  (5);  let  -  c.  Then  +  . . .  +  F  .  ■  1-c, 

1  5 1  12,  ra-1 

m-1 

or  ti/(l-c)  +  F_/ (1-c)  +  ...  +  9  ,/( 1-c)  a  1  (since  V*  f  -  1  is  the 

1  2  m-1  1 

boundary  of  the  region  A).  Referring  back  to  equation  (5)  it  can  be 
seen  that  the  c^s  in  that  equation  are  all  equal  to  1-c  and  that, 
therefore,  the  integral  on  the  r.h.s.  of  (7)  is  (l-c7n  */r(m).  Thus 
F*(c)  -  [(m-l)!(l-c)m’l3  /r(m)  -  (l-c)"1’1.  Therefore  F^c)  -  1  - 
(l-c)1""1.  Since  this  holds  if  c  is  set  equal  to  any  value  of 
between  0  and  1,  F^  can  replace  c  in  the  equation;  differentiation 
gives  the  probability  density  function  of  F^  and  hence  of  all  the  F<®5 

W  “  <8> 

From  (8)  the  expectation  of  c.  is  easily  computed-- 
r1  m-2 

E(Fi>  -  |o  F^(m-l)(l-F^)  .  Recourse  to  a  table  of  integrals  will 

quickly  convince  the  reader  that  E (r^)  •  l/m.  Figure  3  shows  f^(F^) 
for  several  values  of  m. 


Insert  Figure  3  about  here 

Jamison  and  Kosielecki  [37]  have  determined  empirical  values  of 

★ 

the  function  f^CC^)  f°r  ®  equal  to  two,  four,  and  eight.  The  expert 
ment  was  run  under  conditions  that  simulated  total  uncertainty.  The 
results  were  that  subjects  underestimated  density  in  regions  of  rela¬ 
tivity  high  density  and  overestimated  it  in  regions  of  low  density — 
an  interesting  extension  of  previous  results. 

*This  work  appears  as  Part  Four/One  of  this  dissertation — see 
pp.  174-189. 
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Let  u(d^ , r) 
is  given  by: 


e ,  u(d  ,uu  ) . 
j-t  J  1  J 


Then  the  expected  utility  of 


u(dt) 


:•  J 


u(d£ ,c)d~. . 


(9) 


This  is  equal  to  £  E(ff  )  u(d.,jb  )  »  (l/m)£  u(d  ,u>.)»  since  u(d  ,") 

j“l  J  1  J  j-1  J 

Is  a  linear  function  of  the  random  variables  Thus,  taking  the  view 

of  total  ignorance  adopted  herein,  we  arrive  by  a  different  route  at 
the  decision  rule  advocated  by  Bernoulli  and  Laplace  and  axioroatized 
in  Chernoff  [21] . 


Decisions  Under  Partial  Ignorance 

Partial  ignorance  exists  in  a  given  formulation  of  a  decision  if 
we  neither  know  the  probability  distribution  over  C  nor  are  in  total 
ignorance  of  it.  If  we  are  given  f ( , . . . , f  )  ,  the  density  over 
computation  of  Eu(d^)  under  partial  ignorance  is  in  principle  straight¬ 
forward  and  proceeds  along  lines  similar  to  those  developed  in  the 
previous  section.  Equation  (9)  is  modified  in  the  obvious  way  to: 

EuCd^)  -  J  J  ...  J  f(»)u(dt  .^d  .  (10) 

If  f  is  any  of  the  large  variety  of  appropriate  forms  indicated  just 
prior  to  equation  (5),  the  integral  in  (10)  may  be  easily  evaluated 
using  Dirichlet  integration;  otherwise  more  cumbersome  techniques  must 
be  used. 

In  practice  it  seems  clear  that  unless  the  decision-maker  has 
remarkable  intuition,  the  density  f  will  be  most  difficult  to  specify 
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from  the  partial  information  at  hand.  Fortunately  there  is  an  alter¬ 
native  to  determining  f  directly. 

Jeffrey  [38,  pp.  18 3- 190 "> ,  in  discussing  degree  of  confidence  of 

a  probability  estimate,  describes  the  following  method  for  obtaining 

4 

the  distribution  function,  for  a  probability.  Have  the  deci¬ 

sion-maker  indicate  for  numerous  values  of  what  his  subjective 
estimate  is  that  the  "true”  value  of  c.  is  less  than  the  value  named. 

l 

To  apply  this  to  a  decision  problem  the  distribution  function--and 
hence  f^(c  )  — for  each  of  the  =^3  must  be  obtained.  Next,  the  expecta¬ 
tions  of  the  'is  must  be  computed  and,  from  them,  the  expected  utili¬ 
ties  of  the  d^s  can  be  determined.  In  this  way  partial  information  is 
processed  to  lead  to  a  Bayesian  decision  under  partial  ignoranc-. 

It  should  be  clear  that  the  decision-maker  is  not  free  to  choose 

pl 

the  f ,  s  subject  only  to  the  condition  that  for  each  f . ,  '  f.(r,)ds\  ■  1. 

i  J  J  i  q  i  i  i 

Consider  the  example  of  the  misguided  decision-maker  whc  believed 
himself  to  be  in  total  ignorance  of  the  probability  distribution  over 
3  states  of  nature.  Since  he  was  in  total  ignorance,  he  reasoned,  he 
must  have  a  uniform  p.d.f.  for  each  r,.  That  is,  f,(r.)  ■  a 

f^(r^)  "  1  for  0  1.  If  he  believes  these  to  b<  ;  <■  p.I.i  ...  « 

should  be  willing  to  simultaneously  take  even  odds  on  bets  that  >  1/2, 
%  >  1/2,  and  >  1/2.  I  would  gladly  tike  these  three  beta,  for 
under  no  conditions  could  I  fail  to  have  a  net  gain.  This  example 
illustrates  the  obvious--certaln  conditions  must  be  placed  on  the  f^s 
in  order  that  they  coherent .  A  necessary  condition  for  coherence  is 
indicated  or  low;  I  have  not  yet  derived  sufficient  conditions. 
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Consider  a  decision,  d^,  that  wiil  result  in  a  utility  of  1  for 

each  ul'j.  Clearly,  then,  Eu(d^)  **  1.  However,  Eu(d^)  also  equals 

E(?l)u(dk,u1)  +  ...  +  E<’rli))u(dk*tl)ra)*  Since  for  1  <;  i  ^m,  u^.u^)  -  1, 

a  necessary  condition  for  coherence  of  the  f, s  is  that  (f  )  ■  1, 

1  i-1  1 

a  reasonable  thin?,  to  expect.  That  this  condition  li  net  sufficient  is 
easily  illustrated  with  two  states  of  nature.  Suppose  ehat  f,^)  1* 
given.  Since  ■  i  .  e^,  uniquely  determined  given  f^. 

However,  it  is  obvious  that:  infinitely  many  f^s  will  satisfy  the  con¬ 
dition  that  fC'j)  *  1  -  E(Fj),  and  if  a  person  were  to  lave  two  dis¬ 
tinct  f^s  it  would  be  easy  to  make  a  book  against  him;  hi#  beliefs 
would  be  incoherent. 

If  m  is  not  very  large,  it  would  be  possible  to  obtain  condi¬ 
tional  densities  of  the  form  etc‘*  tn  * 

>armer  analogous  to  that  discussed  by  Jeffrey.  If  the  conditional 
densities  were  obtained,  then  f(0  would  be  given  by  the  following 
expression: 


f(p>  -  *•*  fmtFJW  *•*  ^m-P' 


Ol) 


A  sufficient  condition  that  the  f^s  be  coherent  is  that  the  integral 
of  f  over  “  be  unity;  if  it  differs  from  unity,  one  way  to  bring  about 
coherence  would  be  *-o  multiply  f  by  the  appropriate  constant  and  then 
find  the  new  fjS.  If  m  is  larger  than  4  or  5,  this  method  of  insuring 
coherence  will  tr.  hopelessly  unwieldy.  Something  better  i«  needed. 

At  this  point  I  would  like  to  discuss  alternatives  and  objections 
to  the  theory  of  decisions  under  partial  information  that  is  developed 
here.  The  notion  of  probability  distributions  over  probability 
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discributioiis  has  been  around  for  a  long  time;  Knight,  Lindal  1  ,  and 
Tintner  explicitly  used  the  notion  in  economics  some  time  ago  (see 
Tintner  [71].'*  This  work  has  not,  however,  been  formulated  in  terms  of 
decision  theory.  Hodges  and  Lehmann  [28]  have  proposed  a  decision  rule 
for  partial  ignorance  that  combines  the  Bayesian  and  minimax  approaches. 
Their  rule  chooses  the  d^  that  maximizes  Eu^)  for  some  best  estimate 
(or  expectation)  of  5,  subject  to  the  condition  that  the  minimum  utility 
possible  for  d.  is  greater  chan  a  preselected  value.  This  preselected 
value  is  somewhat  less  than  the  minimax  utility;  the  amount  less  increases 
with  our  confidence  that  5  is  the  correct  distribution  over  ft.  Ellaberg 
[24],  in  the  lead  article  of  a  spirited  series  in  the  Quarterly  Journal  of 
Economics ,  provides  an  elaborate  justification  of  the  Hodges  and  Lehmann 
procedure,  and  I  will  criticize  his  point  of  view  presently. 

Hurwicz  [32]  and  Good  (discussed  in  Luce  and  Raiffa  [41],  p.  305) 
have  suggested  characterizing  partial  ignorance  in  the  same  fashion  that 
was  later  used  by  Atkinson,  et  al_.  ,  [5].  That  is,  our  knowledge  of  5  is 
of  the  form  5  e  h  where  is  a  subset  of  s.  Hurwicz  then  proposes  that 

we  proceed  as  if  in  total  ignorance  of  where  is  in  .  In  the  spirit  of 
the  second  section  of  this  paper,  the  decision  rule  could  be  Bayesian  with 
f({?)  *  K  for  X  e  and  f(£)  »  0  elsewhere.  Hurwicz  suggests  instead  utili 
zation  of  non-Bayesian  decision  procedures;  difficulties  with  non-Bayesian 
procedures  were  alluded  to  in  the  introduction  to  subsection  III. 

Let  me  now  try  to  counter  some  objections  that  have  been  raised 
against  characterizing  partial  ignorance  as  probability  distributions 
over  probabilities.  Ellsberg  [24,  p.  659]  takes  the  view  that  since 
representing  partial  ignorance  (ambiguity)  as  a  probability  distribution 
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over  a  distribution  leads  to  an  expected  distribution,  ambiguity  must 
be  something  different  from  a  probability  distribution.  I  fail  to 
understand  this  argument;  ambiguity  is  high,  it  seems  to  me,  if  f  is 
relatively  flat  over  r  ,  otherwise  not.  The  "reliability,  credibi¬ 
lity,  or  accuracy”  of  one's  information  simply  determines  how  sharply 
peaked  f  is.  Even  granted  that  probability  is  somehow  qualitatively 
different  from  ambiguity  or  uncertainty,  the  solution  devised  by 
tlodges  ana  Lehmann  [28]  and  advocated  by  Ellgberg  relies  on  the 
decision-maker's  completely  arbitrary  judgment  of  the  amount  of  ambi¬ 
guity  present  in  the  decision  situation.  Ellsberg  would  have  us  hedge 

•  * 

against  our  uncertainty  in  *  by  rejecting  a  decision  that  maximized 
utility  against  the  expected  distribution  but  that  has  a  possible  out¬ 
come  with  a  utility  below  an  arbitrary  minimum.  By  the  same  reasoning 
one  could  "rationally"  choose  d^  over  in  the  non- ambiguous  problem 
below  if,  because  of  our  uncertainty  in  the  outcome,  we  said  (arbi¬ 
trarily)  that  we  would  reject  any  decision  with  a  minimum  gain  of  less 
than  3. 


X- 


5 

25 


f  ('j)  “  E (r2)  *  .5 


I  would  reject  Ellsberg's  approach  for  the  simple  reason  that  its 
pessimistic  bias,  like  any  minimax  approach,  leads  to  decisions  that 
fail  to  fully  utilize  one's  partial  information. 

Savage  [56,  pp.  56-60"!  raises  two  objections  to  second-order 
probabilities.  The  first,  similar  to  Ellsberg's,  is  that  even  with 


second -order  probabilities  expectations  for  the  primary  probabilities 
remain.  Thus  we  may  as  well  have  simply  arrived  at  our  best  subjective 
estimate  of  the  primary  probability,  since  it  is  all  that  is  needed  for 
decision-making.  This  is  correct  as  far  as  it  goes  but,  without  the 
equivalent  of  second-order  probabilities,  it  is  impossible  to  specify 
how  the  primary  probability  should  change  in  the  light  of  evidence. 

Savage's  second  objection  is  that  "...once  second-order  probab i I i t tes 
are  introduced,  the  introduction  of  an  endless  hierarchy  seems  ines¬ 
capable.  Such  a  hierarchy  seems  very  difficult  to  interpret,  and  it 
seems  at  best  to  make  t he  theory  less  realistic,  not  more."  Luce  and 
Raiffa  [41,  p.  305]  express  much  the  same  objection.  An  endless  hier¬ 
archy  does  not  seem  inescapable  to  me;  we  simply  push  the  hierarchy  back 
as  far  as  is  required  to  be  'realistic.'  In  making  a  physical  measure¬ 
ment  we  could  attempt  to  specify  the  value  of  the  measurement,  the  probable 
error  in  the  measurement,  the  probable  error  in  the  probable  error,  and 
on  out  the  endless  hierarchy.  But  it  is  not  done  that  way;  probable 
errors  usually  seem  to  be  about  the  right  order  of  realism.  Similarly, 

I  suspect  that  second-order  probabilities  will  suffice  for  most  circum¬ 
stances,^*  However,  it,  discussing  concept  formation  in  Section  V,  I  shall 
have  occasion  to  use  what  are  essentially  third-order  nrobabil  ties. 

Induction 

The  preceding  discussion  has  been  limited  to  situations  in  which 
the  decision-maker  lias  no  option  to  experiment  or  otherwise  acquire  in- 
formation.  When  the  possibi.1  ity  of  experimentation  is  introduced,  the 
number  of  alternatives  open  to  the  decision-maker  is  greatly  increased, 
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as  is  the  complexity  of  his  decision  problem,  for  the  decision-maker 
trust  now  decide  which  experiments  to  perform  and  in  what  order,  when 
to  stop  experimenting,  and  which  course  of  action  to  take  when  experi¬ 
mentation  is  complete.  The  problem  of  using  the  information  acquired 
is  the  problem  of  induction. 

If  we  are  quite  certain  that  |  is  very  nearly  the  true  probability 
distribution  over  additional  evidence  will  little  change  our  beliefs. 
If,  on  the  other  hand,  we  arc  not  at  all  confident  about  5  --  if  f  is 
fairly  flat  --  new  evidence  can  change  our  beliefs  considerably.  (New 
evidence  may  leave  the  expectations  for  the  unaltered  even  though 
it  changes  beliefs  by  making  f  more  sharp.  In  general,  of  course,  new 
evidence  will  both  change  the  sharpness  of  f  and  change  the  expecta¬ 
tions  of  the  £^s.)  Without  the  equivalent  of  second-order  probabilities 
there  appears  to  be  no  answer  to  the  question  of  exactly  how  new  evidence 
can  alter  probabilities.  Suppes  [62]  considers  an  important  defect  of 
both  his  and  Savage's  [56]  axiomat izations  of  subjective  probability 
and  utility  to  be  their  failure  to  specify  how  prior  information  is  to 
be  used.  Let  us  consider  an  example  used  by  both  Suppes  and  Savage. 

A  man  must  decide  whether  to  buy  some  grapes  which  he  knows  to  be 
either  green  (u^)  »  ripe  (uu^)  »  or  rotten  (j*,).  Suppes  poses  the  fol¬ 
lowing  question:  If  the  man  has  purchases  grapes  at  this  9tore  15  times 
previously,  and  has  never  received  rotten  grapes,  and  has  no  informa¬ 
tion  aside  from  these  purchases,  what  probability  should  he  assign  to 
the  outcome  of  receiving  rotten  grapes  the  16th  time? 

Prior  to  his  first  purchase,  the  man  was  in  total  ignorance  of  the 
probability  distribution  over  q*  Thus  from  equation  (8)  we  see  that 
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the  density  for  ^ ,  t^e  prior  probability  of  receiving  rotten  grapes, 
should  be  (53)  -  2  -  253,  Le  t  X  be  the  event  of  receiving  green  or 
ripe  grapes  on  the  first  15  purchases;  the  probability  that  X  occurs, 
given  is  p(x|§3>  ■  (1  -  §3)i5.  What  we  desire  is  f3(§3|X),  the 
density  for  5^  given  X,  and  this  is  obtained  by  Bayes'  theorem  in  the 
following  way: 

f3(?3|x)  -  P(x|g3)f3C53)/C  P(x|s3)f3(53)dg3  (12) 

After  inserting  the  expressions  for  ^ 3 (§3)  anc*  p(X  ?3)  »  equation  (12) 
become  s : 

f3^3|X)  -  (1  -  ?3)15(2  -  25^/f1  (1  -  ?3)l5(2  -  253)d?3  . 

Performing  the  integration  and  simplifying  gives  f^ (§3! X)  53  17(1  -  §3)^; 

from  this  the  expectation  of  g3  given  X  can  be  computed  -- 
1  16 

E (5  |X)  *  17  J*  5  (1  -5,)  *  1/18.  (Notice  that  this  resuit  differs 

0 

from  the  1/17  that  Laplace's  law  of  succession  would  give.  The  differ¬ 
ence  is  due  to  the  fact  that  the  Laplacian  law  is  derived  from  consider¬ 
ation  of  only  two  states  of  nature--rotten  and  not  rotten.7) 

My  purpose  in  this  section  was  to  show  why  second-order  probability 
distributions  are  useful  in  thinking  about  subjectivistic  theory  of  in¬ 
duction,  and  I  have  outlined  the  nature  of  such  a  theory. 
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IV.  SUBJECTIVISTIC  INTERPRETATION  OF  CARNAP'S  INDUCTIVE  SYSTEM 

Rudolf  Carnap  [16]  has  devised  a  system  cf  inductive  logic  that 

* 

fits  within  the  framework  of  the  logical  theory  of  probability.  The 
purpose  of  this  section  is  to  show  that  Carnap's  system  can  be  inter¬ 
preted  in  a  straightforward  way  as  a  special  case  of  the  subjectivis¬ 
tic  theory  of  induction  presented  in  the  preceeding  section.  That  it 
can  be  so  interpreted  does  not  imply,  of  course,  that  it  must  be  so 
interpreted.  Let  me  begin  by  informally  sketching  Carnap's  X  conti¬ 
nuum  of  inductive  methods. 


Carnap's  system  is  built  around  a  "language"  that  contains  names 
of  n  individuals  —  x^  x2,  ...,  —  and  tt  one  place  primitive 

predicates  —  Pj,  ?2»  •••» ,  P^  •  Of  each  individual  it  may  be  said 
that  it  either  does  or  does  not  instantiate  each  characteristic,  i.e., 
for  all  i  (1  £  i  s  n)  and  all  j  (1  s  J  s  tt),  either  Pj(xi)  or  "‘Pj(xi)* 
If,  for  example,  is  "is  red,"  then  x^  is  either  red  or  it  isn't. 

A  "Q  -  predicate"  is  defined  as  a  conjunction  of  n  primitive 
characteristics  such  that  each  primitive  predicate  or  its  negation 
appears  in  the  conjunction.  Let  .  by  the  number  of  Q-predicates; 
clearly,  .  ■  2n  .  The  following  are  the  Q-predlcates  if  tt  ■  2: 


*In  a  still  unpublished  manuscript  Carnap  [18]  extends  his  original 
system  in  a  number  of  ways,  some  similar  to  those  suggested  here. 


I 
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If  Pj  is  "is  red"  and  is  "is  square**  then,  for  example,  Q^(Xj) 
meand  that  is  neither  red  nor  square,  etc. 

The  Q  properties  represent  the  strongest  statements  that  can  be 
made  about  the  individuals  in  the  system;  once  an  individual  has  been 
asserted  to  instantiate  a  Q-predicate,  nothing  further  can  be  said 
about  it  within  the  language.  Weaker  statements  about  individuals  may 
be  formed  by  taking  disjunctions  of  Q-predicates.  To  continue  the 
preceeding  example  if  we  let  H  -  QjVQ^VQ^,  then  K(xt)  is  true  if  Xj 
is  either  red  or  square  or  both.  Any  non-selfcontradictory  character¬ 
istic  of  an  individual  that  can  be  described  in  the  language  can  be 
expressed  as  a  disjunction  of  Q-predlcates. 

The  logical  width,  w,  of  a  predicate,  aay  M,  is  the  number  of 
Q-predicates  in  the  disjunction  of  Q-predicates  equivalent  to  N.  Its 
relative  width  is  defined  to  be  w/k.  If  M  is  as  defined  in  the  pre¬ 
ceeding  paragraph,  its  logics'’  width  would  be  3  and  its  relative  width 
3/A.  A  predicate  equivalent  to  the  conjunction  of  all  the  Q-predicatas 
in  the  system  is  tautologically  true  and  its  relative  width  is  l.  The 
logical  width  of  a  predicate  that  cannot  be  instantiated  (like  &  -Pj) 
Is  aero.  In  some  sense,  then,  the  greater  the  .e! stive  width  of  a  pre¬ 
dicate  the  more  likely  it  is  to  be  true  of  any  given  individual. 

Notice  that  the  relative  width  of  any  primitive  predicate,  P^,  is  1/2, 
whatever  the  value  of  n  . 

Let  us  turn  now  to  the  inductive  aspects  of  the  system.  Suppose 
that  we  are  interested  in  some  property  M  and  have  seen  a  sample  of 
size  s  of  individuals,  of  whom  had  the  property  N.  What  are  we  to 
think  of  the  (logical)  probability  that  the  next  Individual  th"»  we 
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observe  will  have  the  property  M?  Carnap  auggests  that  two  factors 
enter  Into  assessing  this  probability.  The  first  is  an  empirical  factor, 
s^/s,  which  is  the  observed  fraction  of  individuals  having  property  M. 

The  second  la  a  logical  factor.  Independent  of  observation,  and  equal 
to  the  relative  width  of  M  —  w/tc  .  A  weighted  average  of  these  two 
factors  gives  the  probability  that  the  s+ist  individual,  will 

have  M.  One  of  the  factor  weightings  may  be  arbitrarily  chosen  and, 
for  convenience,  Carnap  chooses  the  weight  of  the  empirical  factor  to 
be  d.  The  weight  of  the  logical  factor  is  given  by  a  paraneter  X 
(X  may  be  some  function  X(ir)l  but  we  need  not  go  into  that).  Thus  we 
hove: 


prob(M(x#+j)  is  true)  -  (s^  +  Xw/kt)/(s  +  X)  (14) 

The  limiting  value  of  the  expression  in  (14)  as  X  gets  very 
large  is  w/«.,  i.e.,  only  the  logical  factor  counts.  If,  on  the  other 
hand,  X  *  0  then  the  logical  factor  has  no  weight  at  all  and  only 
empirical  considerations  count.  Thus  the  parameter  X  indexes  a  conti¬ 
nuum  of  inductive  methods  —  from  those  giving  all  weight  to  the  logi¬ 
cal  f*ctor  to  those  giving  it  none. 

£  Interpretation  of  the  ^  SY*tea 

There  are  r  •  2n  Q-predlcatee  in  the  Carnap  system.  The  Q- 

predlcates  may  be  numbered  Q1 . Q  .  Let  $  be  the  (subjective) 

probability  that  any  individual  will  instantiate  Q^.  The  probabilities 

may  be  unknown  and,  following  the  precedent  of  the  proceeding 

section,  we  may  represent  our  knowledge  of  these  probabilities  by  a 

density  f  defined  on  .  .  Since  f  *  1  -  P.  -...-*  ,,  the  density 

*  l  «-  - 1 
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need  only  be  defined  on  i  »  -  1  dimensioned  region  analogous  to  the 
region  A  in  figure  1.  The  densities  we  shall  consider  will  be  Dirichlet 
densities,  so  let  us  now  define  these  densities  and  examine  some  of 
their  properties. 

The  <  -  1  variate  Dirichlet  density  is  defined  for  all  points 


(e^,  p  _})  web  that  £  0  and  2^  ^  s  1*  The  density  has  ■ 

parameters  •«  v  ,  . ..,  —  and  is  defined  as  follows: 


f(rji  •  •  • 


mT^T 


V1 


V  .-1 
.  <  - 1 


V-l 


-1 


-%-2> 


(15) 


where  the  sums  (p)  and  products  (II)  are  over  all  the  v  ,  and  the  P 

i 

denotes  the  gamma  function.  Let  ua  let  for  1  <  i  <  .  and 

see  what  happens.  First  we  need  two  theorems  proved  in  Wilks  [73, 

PP.  177-182': 

Theorem  1_ .  If  *  la  a  random  variable  in  the  density  given  in 

(15)  then  E  (»  )  •  \>  /  "  v. . 

1  1  i-l 

Theorem  2.  If  . *  is  a  vector  random  variable  having 

a  .-l  variate  Dirichlet  densltv  with  parameters  . . -  ,  then  tie 

random  variable  (*.,...,  a  )  where  *  »  *'.+...♦  »  i  •  f  .,♦...+ 

i  a  ii  *  J j+* 

Vh . '• '  V-+J.: 

has  an  s  variate  Dirichlet  distribution  with  parameters 

Wh*r*  61  •  V-+  VJ, . V"+V 


and  9 


•1  '  J1 
,,'...+  v  • 


8+1  .+.  .. 

Finally  we  need  or.c  more  standard  theorem  about  Dirichlet  dietrl* 


buttons  that  concerns  modification  of  the  density  by  Bayes1  theorem 
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li  the  light  of  new  evidence.  This  theorem  too  will  be  stated  without 
proof. 

Theorem  3.  If  ,  are  the  probabilities  of  the  Q- 

predicates  *nd  if  l-^-* •  i#  che  probability  of 

Q  ,  if  the  prior  density  for  the  f^s  is  a  Dlrlchlet  with  parameters 

. . .  ,  and  if  an  observation  of  s  individuals  is  made  in  which  s. 

have  property  (r  s^  ■  s) ,  then  the  posterior  density  for  the  ^s 
is  a  Dirlchlet  with  parameters  v.,...,  v'  where  ■  vi  +  *£  f°r 
l  s  i  <  >  . 

With  this  mathematical  apparatus  at  hand  we  can  readily  show  that 
Carnap's  X  continuum  is  formally  Identical  to  a  subjectivist  inductive 
system  when  the  prior  on  the  f^s  is  a  Dirlchlet  density  with  all  its 

g 

parameters  equal  to  i.e.,  •  X/»  for  all  l. 

Consider  first  induction  involving  only  Q-predlcates  rather  than 
more  general  predicates.  When  s  •  0  —  before  we  make  any  observa¬ 
tions  --  by  theorem  1  E  (f-^)  -  1/.  for  all  i.  If  we  observe  a  sample, 
X,  of  site  s,  in  which  appears  s^  times  then,  by  theorem  3, 
vj  ■  (V.)  +  s^  and  rvj  *  •  ('-/  )  ♦  s.  By  theorem  1  again: 


M'jx) 


I 


Bi  + 


>/. 


s  ♦  > 


(16) 


Since  the  logical  width,  w,  of  a  Q-predicate  is  1,  (16)  Is  clearly  the 
same  as  (14)  when  the  predicate  M  referred  to  there  is  a  Q-predicate. 

To  deal  with  predicates  sure  complicated  than  Q-predlcates  we 
need  theorem  2.  Consider  ■  predicate  N  with  logical  width  w;  -M,  then. 
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has  logical  width  .-w.  By  theorem  2  the  prior  density  function  for 
f  (the  probability  of  M)  will  be  the  one  variate  Dirichlet  (or  beta) 

n 

density  with  parameters  ■  w\/«.  and  ■  (.-w)\/ic.  By  theorem  1 
the  prior  expectation  of  f  is  what  it  should  be: 


E  (f^)  ■  (w>/-  )  (wi/,  +  (k-w)\/<)  ■  w/k. 


If  we  observe  a  sample  X,  of  size  s,  that  has  a  total  or  instances 
of  M  (and,  therefore,  s-a^  instances  of  -M)  then  ■  wX/.  +  and 
Vj  ■  (<-w)X/„  +  s  -  a^.  Using  theorem  1  again  we  obtain: 


E  (< 


X>  * 


/  (v|  +  Vj)  ■  (s^  +  Xw/ « )  (s  t-  X) , 


(17) 


whicli  is  essentially  the  same  as  (14) 

Leaving  aside  debate  concerning  the  relative  philosophical  me?  'ts 
of  the  logical  vs.  subjective  views,  the  subjectivist  approach  has  two 
important  advantages  over  the  X  system.  Thes2  are: 

1.  In  the  Carnapian  system  for  all  t  and  J;  this  clearly 

much  reduces  the  range  of  possible  prior  distributions.  Or,  to  put 
this  another  way,  Carnap's  1  dimensional  continuum  of  inductive  methods 
is  a  special  case  of  a  •  dimensional  continuum. 

2.  Second,  it  may  be  dealrable  to  have  predicates  in  the  lan¬ 
guage  that  are  not  dlchotoroua.  For  exsaple,  snatead  of  saying  of 

x^  that  it  ia  red  or  not  red,  we  may  wiah  to  aay  that  It  is  red,  puce, 
or  ultramarine.  If  we  denote  by  V(P^)  the  number  of  alternatives  P^ 
may  take  on,  then  the  number  of  Q-predicatea  we  have,  ►  ,  is  given  by: 


H  V(P  ), 

J-l  J 


«r  • 


(18) 
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vhere,  as  before,  tt  la  the  number  of  predicates.  Clearly  the  sub* 
jective  approach  can  handle  any  finite  value  of  V(.). 
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V.  CONCEPT  FORMATION  AND  INDUCTION 

My  purpose  in  his  sect,  ion  is  to  provide  an  essentially  Bayesian 
mechanism  for  certain  types  of  concept  formation.  It  turns  out  that 
this  cask  is  closely  related  to  providing  a  subjectivistic  generaliza¬ 
tion  of  Hintikka's  [26]  two  dimensional  continuum  of  inductive  methods, 
and  I  shall  begin  by  briefly  describing  his  work.  Next  I  shall  provide 
a  subjectivistic  interpretation  of  it  then  show  how  all  this  relates  to 
concept  formation. 

Hintikka's  Two  Dimensional  Continuum  of  Inductive  Methods 

Consider  a  predicate  M  (a  disjunction  of  several  Q-predicates)  and 
suppose  that  we  have  observed  several  thousand  individuals  that  all  of 
them  have  instantiated  M,  and  that  there  exists  no  M'  with  logical  width 
less  than  M  such  that  all  the  observed  individuals  also  instantiated  M' . 
Having  seen  several  thousand  instances  of  M,  and  none  of  -M,  we  may  very 
well  wish  to  assign  a  non-zero  probability  to  the  assertion  that  all  of 
the  (infinite  number  of)  individuals  in  this  series  exemplify  M.  This 
cannot  be  done  in  the  Carnapian  system  (unless  M  is  tautologous)  or  in 
the  sub jec tivis itc  general  1 zat ion  of  it  that  I  outlined;  that  is,  what 
is  known  as  inductive  generalization  is  impossible  in  these  systems. 
Hintikka's  [26]  purpose  is  to  generalize  the  Carnapian  system  in  such 
a  way  that  inductive  generalization  is  possible. 

Hintikka  defines  a  "constituent"  in  the  following  way:  the  con¬ 
stituent  C(l,j,k)  is  true  if  and  only  if 

(  v);.(x)  f,  (  x)Q.(x)  &  (  x)Qk(x)  &  (x)rQ.(x)W.(x)VQk(x)] 
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is  true.  Referring  back  to  equation  (13),  C(1  ,3)  would  mean  that  all 
individuals  have  the  property  ,  some  have  .  and  some  don’t  have  , 
C(.)  may  have  any  number  of  arguments  from  1  to  »*;  let  us  denote  by  C 

w 

any  constituent  that  asserts  that  exactly  w  Q-predicates  are  instantiated. 

j  is  the  number  of  different  constituents  there  are  with  exactLy  w 
Q-predicates  instantiated.  The  total  number  of  constituents,  N.  is, 
therefore,  given  by: 


Assume  that  a  total  of  different  Q-predicates  have  been  observed 
in  a  sample,  e,  of  size  n.  Consider  a  constituent  C* ,  Following  Hin- 
tikka,  we  obtain  by  Bayes'  theorem  the  posterior  probability  for  C* 
given  e,  under  the  assumption  that  the  prior  probability  of  a  constit¬ 
uent  depends  only  on  the  number  of  Q-predicates  in  it: 


p(C*  |e)  -  ;■  ^f*^0** -  (2°> 

t  fcrH’pMv 

w»l 

where  p(C  )  is  the  prior  probability  of  a  constituent  containing  w  Q- 
w 

predicates.  (Equation  (20)  corrects  some  typographical  mistakes  in 
Hintikka's  equation  (2).) 

Hintikka  makes  two  assumptions  to  obtain  the  prior  probabilities 
p(C^)  and  the  likelihood  p(e|Cw).  As  noted,  unless  w  =»  <  ,  p(C^)  *  0 
in  the  Carnapiari  system  with  an  infinite  number  of  individuals.  Hin¬ 
tikka  uses  as  p (C^)  the  (non-zero)  number  that  p(C^)  would  be  in  a 
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Carnapian  universe  with  a  individuals.  Thus  he  obtains  a  family  of 
priors  indexed  by  «  running  from  0  to  *>.  To  obtain  p(e|Cw)  he  makes 
the  same  assumptions  as  in  the  Carnapian  system  except  that  he  allows 
only  w  in.*.  tead  of  x  Q-predicates.  In  this  way  Hintikka  allows  for  the 
possibility  of  inductive  generalization.  A  low  u  corresponds  to  a 
prior  expectation  of  a  highly  ordered  universe  in  which  but  few  Q- 
predicates  are  instantiated;  a  high  o<  corresponds  to  a  prior  expecta¬ 
tion  that  almost  all  the  Q-predicates  will  be  instantiated.  Carnap's 
system  is  the  special  case  of  Hintikka* s  obtained  by  letting  a  - 

Subjectivistic  Interpretation  of  Hintikka* s  System 

If 

From  (19)  we  see  that  there  are  N  •  2  -  l  different  const  tuents; 

les  us  label  them  C, ,...,CU  letting  C„  be  the  constituent  containing 
all  k  Q-predicates.  To  each  let  us  assign  a  w-varlate  Dirichlet 
density  where,  as  before,  w  is  the  number  of  Q-predicates  asserts 
to  exist.  (A  1 -variate  Dirichlet  density  is  assumed  to  be  an  impulse 
or  6  function.)  The  Dirichlet  density  corresponding  to  ,  which  I 
shall  call  D^,  is  assumed  to  hold  given  that  is  true.  is  a  p.d.f. 
for  the  probabilities  5^  of  the  Q-predicates  contained  in  C^.  Let 

C  *  . C«)  be  a  vector  that  gives  the  prior  probabilities  of  the 

C^s,  i.e.,  p(C^)  ■  We  thus  have  third  order  probabilities--^ 

corresponds  to  the  probability  that  is  the  correct  p.  for  the 
probabilities  5^.  If  •  1  and,  hence,  ail  the  other  £{s  equal  zero, 
we  have  the  subjective  system  outlined  previously  in  this  paper.  If 


all  the  D^s  are  equal  for  constituents  containing  the  same  number  of 
Q-predicates ,  if  each  has  all  its  parameters  equal  to  one  another, 
if  all  the  predicates  are  dichotomous,  and  if  £  is  contained  in  a 
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certain  subset  of  Hjj,  then  the  system  outlined  here  reduces  to  Hintikka'a 
two  dimensional  continuum.  Development  of  mathematical  detail  must  await 
another  time . 

Concept  Formation  and  Induction 

In  lectures  at  Stanford  University,  Professor  Patrick  Suppes 
developed  what  he  calls  the  "template"  representation  of  a  concept. 

This  has  been  further  developed  in  a  recent  paper  by  Roberts  and  Suppes 
[53].  His  lectures  centered  around  the  psychological  problem  of  de¬ 
scribing  how  people  actually  do  acquire  concepts.  A  typical  experimen¬ 
tal  paradigm  would  be  something  like  the  following:  A  subject  la  shown 
geometrical  figures  that  differ  in  size,  form,  and  color.  After  he  is 
shown  a  figure  he  must  say  whether  the  figure  belongs  to  class  "A"  or 
whether  it  does  not.  After  making  his  response,  the  subject  is  told 
the  correct  answer,  then  shown  a  new  figure. 

Let  us  assume  there  are  three  sizes,  three  colors,  and  three  forms. 
Each  figure  can  then  be  described  by  a  Q-predicate;  by  equation  (18)  the 
total  number  of  Q-predicates  is  27.  To  the  three  natural  predicates-- 
size,  form,  and  color--we  can  add  the  predicate  "is  a  member  of  class 
'A'."  Thus  we  have  a  new  system  with  54  Q-predicates.  Suppose  the  con¬ 
cept  to  be  learned  is  "is  aquamarine  or  triangular";  exactly  one  of  the 
2"*4-l  constituents  exemplifies  this  concept.  More  specifically,  that 
constituent  is  Qx)[R(x)  &  A(x)]  &  (3x)[-R(x)  &  -A(x)]  and  (x){fR(X) 

&  A(x) ]  V[-R (x)  &  -A(x) ]} ,  where  R(x)  is  "x  is  aquamarine  or  triangular" 
and  A(x)  is  "x  is  in  class  'A'."  An  Important  question  then  is  whether 
or  not  the  subjectivisitc  generalization  of  Hlntikka's  system  can  pro¬ 
vide  an  adequate  empirical  account  of  human  concept  formation.  The 
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possibility  of  a  low  value  for  a  (or  its  subjectivistic  equivalent)  makes 

it  conceivable  that  this  approach  could  be  adequate  to  account  for  the 

extremely  rapid  concept  learning  that  humans  exhibit. 

Let  me  now  suggest  a  fairly  specific  two  parameter  model  for  human 

concept  formation.  The  assumptions  of  the  model  are: 

Assumption  IL.  On  trial  n  the  subject's  state  may  be  represented 

by  a  vector  S  *  (s . ,s„)  where  N  is  the  number  of  constituents  in 

1  n  1  N 

the  system  and  s^  may  be  considered  the  subject's  estimate  of  the  prob¬ 
ability  that  constituent  holds. 

Assumption  2.  With  probability  8^  ,  is  computed  from  and 

the  most  recently  observed  figure  by  means  of  (20);  with  probability 

1  -  01*  Sn+1  *  V 

Assumption  3.  When  on  trial  n,  the  subject  is  given  a  new  figure 
to  respond  to  he  computes  from  the  probability  that  the  figure  is  in 
cl  as.  "A".  If  this  probability  exceeds  .5  he  responds  "A";  otherwise, 
he  responds  "-A”. 

Assumption  4.  All  constituents  containing  an  equal  number  of  Q- 
predicates  have  equal  prior  probabilities.  The  prior  probability  that 
the  true  constituent  will  have  j  (1  s  j  f  *•')  vj-prodic utes  is  given  by 
l(l  -  Bj)*  ^ .  (Large  fl2  Implies  rapid  inductive  generalisa¬ 
tion  or,  in  Hintlkka*8  system,  it  corresponds  to  small  or.)  This  as¬ 
sumption  determines  . 

Given  these  four  assumptions  and  estimated  values  of  the  parameters 
Gj  and  02,  the  subject's  responses  can  be  predicted  from  the  figures  he 
has  been  shown  and  their  classifications.  It  should  be  clear,  of  course, 


that  the  model  just  outlined  is  but  one  of  mayy  possible  similar  models. 
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I  will  close  this  section  by  posing  two  questions  ; 
extent  can  existing  empirical  models  of  concept  formation 
be  special  cases  (or  generalizations)  of  the  model  I  have 
(ii)  What,  if  anything,  would  estimated  values  of  82  tell 
the  true  regularity  of  the  universe  we  live  in? 


(i)  To  what 
be  s  hown  to 
described? 
us  about 
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VI.  CONCLUDING  COMMENTS 

I  have  attempted  in  this  part  to  extend  a  aubjectiviatic  theory 
of  induction  in  a  way  that  allows  the  logical  systems  of  Carnap  and 
Hintikka  to  appear  as  special  cases.  In  the  course  of  this  effort  I 
have  attempted  to  provide  a  definition  of  information  that  is  adequate 
from  a  subjective  point  of  view  and  have  extended  the  subjectivist  ap¬ 
proach  to  account  for  certain  types  of  concept  formation.  Yet  there 
is  no  thing  in  what  I  have  said  that  would  provide  any  fundamental  justi¬ 
fication  for  utilizing  information  from  the  past  to  make  inferences 
concerning  the  future. 

I  will  conclude  by  suggesting  that  theories  of  induction  may  be 
lexicographically  ordered  according  to  how  satisfactory  they  are.  Along 
the  first  dimension  the  criterion  is  'How  well  does  the  theory  deal  with 
Che  problem  posed  by  Hume?"  All  Inductive  sysr  ms  are  equally  (and  to¬ 
tally)  unsatisfactory  from  this  point  of  view.  Along  the  secondary 
dimension  the  subjective  theory  is,  though  problems  remain,  probably 
the  best.  But  unsatisfactory  is  unsatisfactory:  Hume's  Intellectual 
successors  are  Sartre  and  Dylan. 


I 
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FOOTNOTES 

4  realize  that  this  is  treating  rather  briefly  a  still  ongoing  debate 
concerning  the  nature  of  probability.  But  entering  into  that  discussion 
here  would  take  me  too  far  afield. 

2 

Two  applications  to  psychology  of  the  notion  of  information  discussed 
here  should  be  mentioned;  both  relate  to  problems  posed  by  David  Hume 
[31].  The  first  relates  to  Hume's  distinction  between  simple  and  com¬ 
plex  impressions.  Work  reviewed  by  Miller  [44]  suggests  a  way  of  making 
this  distinction  precise.  Miller  describes  work  that  indicates  that  the 
amount  of  information  a  human  can  process  is  strictly  limli.ed  and  about 
the  same  for  different  dimensions;  combining  dimensions  provides  means 
for  increasing  the  information  input.  Simple  impressions  might  be  de¬ 
fined,  then,  as  impressions  involving  only  one  perceptual  dimension, 
and  complex  ones  defined  as  Involving  more  than  one.  The  problem  here 
is  to  construct  an  algebra  for  combining  perceptual  dimensions  and  one 
approach  to  this  (that  resolves  an  apparent  contradiction  inthe  ex¬ 
perimental  literature)  is  suggested  in  Jamison  [33],  The  second  ap¬ 
plication  of  the  notions  of  semantic  information  to  psychological 

problems  posed  bv  Hume  is  to  the  problem  of  distinguishing  between  memory 
and  imagination.  Here  we  might  say  that  something  is  imagined  If  the 
amount  of  information  concerning  that  something  that  a  person  can  supply 
is  virtually  unlimited.  Otherwise,  it  is  a  memory.  This  definition 
suffers  from  the  defect,  as  Professor  Suppcs  has  pointed  out  to  me,  that 
the  more  vivid  a  memory  is,  the  more  difficult  will  it  be  to  separate  it 


from  imagination. 
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Usual  ly  we  can  characterize  the  uncertainty  In  a  decision  situation  as 
the  sum  of  H(E(£))  and  H(f)  .  If,  however,  f  Itself  is  not  precisely 
known,  the  uncertainty  associated  with  alternative  possible  fs  must  be 
added  in,  and  so  on. 

4 

An  important  practical  problem  for  the  theory  of  subjective  probability 
is  the  problem  of  measuring  subjective  probabilities.  Suppes  [67]  sug¬ 
gests  that  a  problem  with  the  method  of  using  wagers  is  that  persons 
will  change  the  odds  at  which  they  will  bet  as  the  size  of  their  bet 
increases.  A  solution  to  this  problem  is  to  fix  the  size  of  the  per¬ 
son's  bet  ,  let  him  choose  the  odds,  and  have  the  experimenter  choose 
the  side  of  the  bet  the  subject  must  take  (the  "you  divide,  I  choose" 
principle).  If  the  situation  is  such  that  the  subject  believes  that 
the  experimenter  knows  more  about  the  odds  than  he  does,  the  subject 
will  be  strongly  motivated  to  give  an  accurate  probability  assessment 
regardless  of  the  amount  he  has  at  stake. 

^Ronald  Howard  [29]  utilizes  what  are  essentially  probability  distri¬ 
bution?  over  probability  distributions  by  considering  a  probability 
density  function  for  tin*  parameters  of  another  probability  density 

function.  The  notion  of  probabilities  of  probabilities  ia  regularly 
used  in  applied  Bayesian  work. 

^Professor  Suppes  points  out  to  rae  chat,  though  there  is  a  r  ch  body 
of  results  in  meta-mathematics,  mathematicians  apparently  feel  no  need 
to  derive  formal  results  concerning  me ta -mathematics  in  a  mrta-meta- 
mathematics.  I  might  add,  however,  concerning  the  probable  error  ex¬ 


ample  that  several  years  ago  when  I  was  helping  design  an  experiment 
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to  n«asare  the  astronomical  unit,  I  found  the  notion  of  probable  error 
in  probable  error  rather  useful. 

^ Laplace  s  law  of  succession  is  dervied  from  Bayes'  theorem  and  the 
assumption  cf  a  uniform  prior  for  5^.  If  the  uniform  prior  is  changed 
to  any  of  the  possibilities  given  in  equation  (8),  the  following  gen¬ 
eralization  of  the  law  of  succession  can  be  derived:  prt_^(ui^)  *  (n+1)/ 
(r-Hn)  ,  where  pr+^(uj^)  is  the  (expectation  of)  the  probability  that  on 

the  r  +  1st  trial  ut.  will  occur,  n  is  the  number  of  times  is  has  oc  - 
t 

curred  in  the  previous  r  trials,  and  m  is  the  number  of  states  of  nature. 
Since  completing  a  draft  of  this  paper  .  Raimo  Toumela  has  pointed  out 
to  me  that  Good  [25]  has  discussed  notions  that  are  formally  analogous 
to  f(*).  Good  mentions  that  this  generalized  version  of  the  law  of 
succession  was  known  to  Lidstone  in  1925. 

JThls  assertion  must  be  slightly  qualified;  the  Dirichlet  density  is 
undefined  for  v.  *  0.  Hence,  though  the  inductive  method  characterized 
by  \  »  0  may  be  approached  with  arbitrary  closeness,  it  cannot  be  at¬ 
tained  in  the  subjective  system.  This  point  is  of  some  importance, 
since  \  ■  0  is  the  inductive  system  implicit  in  the  'maximum  likeli¬ 
hood'  estimation  principle  chat  is  raelwr  vide'y  used,  at  least  in 
psyc hology . 


t 
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Part  Three/Two 

LEARNING  AND  THE  STRUCTURE  OF  INFORMATION 
I.  PAIRED- ASSOCIATE  LEARNING 

1.  Falred-Aeaociate  Learning  with  Complete  Information 

In  the  experimental  paradigm  for  Che  theories  discussed  in  this 
section,  the  experimenter  presents  the  subject  with  stimuli  in  random 
order.  Each  stimulus  is  paired* to  exactly  one  of  N  response  alter¬ 
natives.  After  seeing  a  stimulus,  the  subject  chooses  the  response 
he  believes  is  correct.  After  the  su’  .t  has  made  a  choice,  the 
experimenter  tells  him  what  the  correct  response  was.  The  subject 
then  proceeds  to  the  next  stimulus.  This  correction  procedure  is 
distinguished  from  noncur rect iou  procedures  in  which  the  subject  is 
told  only  whether  he  was  correct  or  incorrect.  Noncorrection  proce¬ 
dures  are  discussed  briefly  in  Pert  II,  Section  2  with  other  theories 
of  incomplete  infortnat  ion .  Certain  ol  our  proposed  models  for  the 
correction  procedure  bear  mild  resemblance  to  models  for  the  non- 
correction  procedure  presented  by  Millward  [45]  and  Nahinsky  [47]. 

The  objective  of  a  theory  of  PAL  (paired -assoc iat e  learning)  is 
to  predict  the  detailed  statist ;cal  structure  of  subjects'  response 
deta  in  the  type  of  experimental  paradigm  just  described.  Theories 
of  PAL  have  the  following  general  structure.  Por  each  of  the  (h™*" 
gcncous)  stimulus  Items  there  exists  a  set  I  of  states  that  the  subject 
may  be  in  on  any  tri*il  and  s  set  **  of  response  alternatives  that  he 
ma  choose  from.  There  further  exists  a  set  Z  of  reinforcing  everts. 
Finally,  there  exist  ivo  function*:  »  function  f  Chet  maps  Z  x  !t  into 
;0,1]  and  a  function  g  that  maps  Z  *t  9  *  Z  into  [C,?].  (Here  Z  *  5 
denotes  the  cross  product  of  the  sets  Z  and  $.)  The  function  f  gives 
the  probabilities  of  the  various  responses  for  each  state;  the  function 
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g  gives  the  probabilities  of  state  transitions  for  various  reinforce¬ 
ments.  In  this  way,  a  model  of  PAL  may  be  conaidered  an  ordered 
quintuple,  <^®,  3",  f,  g^>.  A  particular  theory  specifies  pre¬ 

cisely  the  members  of  the  three  sets  and  the  form  of  the  two  func¬ 
tions  . 

The  remainder  of  this  section  is  divided  into  two  parts.  In  the 
first  part  we  give  a  brief  review  of  eight  existing  theories  of  PAL. 
In  the  second  part  we  present  several  new  theories.  For  each  new 
theory  we  present  informally  its  assumptions,  its  basic  mathematical 
structure,  a  few  derivations,  and  its  relations  to  other  theories. 

Foci  sting  theories  of  paired-associat  e  learning 
The  linear  model .  Let  p(en)  denote  the  probability  of  an  error 
occurring  on  trial  n.  The  basic  assumption  of  the  linear  model  is 
that  P(en.,j)  is  a  fixed  fraction  of  p(en),  specifically: 

(!)  *(er+i>  =  0  p(e  ) 


If  we  make  the  natural  assumption  that  p(e^)  be  equal  to  (N-l)/N,  then 


(2) 


,  *  N-l  n-1 

'<*„>  '  T  ^ 


Bush  and  Mosteller  [  1 A ]  described  the  linear  model  in  some  detail. 

The  one-element  model .  The  principal  assumption  of  the  one-element 
model  is  that  for  each  stimulus  element  the  subject  is  in  one  of  two 
states--conditioned  to  the  correct  response  or  not  conditioned  to  it. 
II  he  is  not  conditioned,  then  with  probability  c  on  any  trial  lie  be¬ 
comes  conditioned;  once  he  becomes  conditioned,  he  remains  so.  If  the 


I 


1 
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subject  is  conditioned,  he  responds  correctly;  if  lie  is  not,  he  guesses, 
responding  correctly  with  probability  1/N.  The  following  transition 
matrix  and  error  response  probability  vector  summarize  the  one-element 
model,  where  C  and  C  represent  the  conditioned  and  unconditioned  states; 


(3) 


This  matrix  gives  the  probabilities  of  transition  from  one  state 
to  the  next  on  each  trial;  the  vector  gives  the  probability  of  making 
an  incorrect,  response  in  each  state.  The  probability  of  error  on  trial 
n  is  easily  shown  to  be  given  by: 


P<V 


(1-C) 


Bower  [12]  compared  the  linear  and  one-element  models  on  a  wide 

A 

varietv  of  statistics  for  experiments  witli  N»2,  The  one-element  model 
fits  much  better  than  the  linear  model.  But  when  N  *  2,  the  one-element 
model  performs  less  well,  although  still  better  than  the  linear  model. 

The  two-phase  model .  Norman  [49]  proposed  a  two-phase  model  for 
which  lie  assumes  that  no  learning  occurs  up  to  some  trial  k;  after 
trial  k,  learning  proceeds  linearly  with  parameter  Q.  The  trial  of 
first  learning,  k,  is  geometrically  distributed  with  parameter  c.  The 


probability  of  error  on  trial  n  is  given  by: 


(5) 


P<e„> 


N-l  ,,  .n-k 

—  (1-0) 


for  n  f.  k 
for  n  >  k 


Clearly  we  cannot  distinguish  between  the  linear  and  one  element 
models  by  equations  2  and  4  as  tuey  are  essentially  the  same;  the  models 
predict  very  different  dependencies  within  the  data,  however. 
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When  9  *  1,  sl ite  equation  reduces  to  the  one-element  model;  when  c  =  1, 
the  equation  reduces  to  the  linear  model. 

The  random-tr fal  incrementa 1  model .  In  Nornen  [48],  the  RTI 
(random-trial  incremental)  model  postulated  that  on  each  trial  learn¬ 
ing  occurs  with  probability  c;  if  it  does  occur,  it  does  so  linearly 
with  learning  parameter  0-  The  following  equation  summarizes  the 


model 


P(°„  .  i> 


|  ( 1  -  :) p (e  )  with  probability  c 

I  ?(er)  with  probability  1  -  c. 


As  with  the  two-phase  model,  if  9*1,  the  RTI  model  reduces  to  the 
one-element  model  and  if  c  *  1,  it  reduces  to  t he  linear  model. 

The  two-element  model .  Both  the  two-phase  and  the  RTI  models 
primarily  represent  extensions  of  the  linear  model;  Suppea  and  Gins¬ 
berg  [69]  suggested  an  extension  of  the  one-element  model  to  a 
two-element  model.  The  subject  is  in  any  one  of  three  states--CQS 

C  ,  and  C  ;  the  subscript  refers  to  the  number  of  stimulus  elements 
1  2 

conditioned  to  the  correct  response.  Those  not  conditioned  to  the 
correct  response  are  unconditioned.  The  transition  matrix  and  error 
probability  vector  given  below  summarize  the  model: 


ci  C0 


0  0 


1-b  0 


a  1-a 


* 


\ 
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The  model  has  three  parameter's:  the  conditioning  probabilities  a 
and  b  and  the  guessing  probability  g  for  when  the  subject  is  in 
state  C^.  Predicting  a  stationary  probability  of  success  prior  to 
last  error  is  one  of  the  major  shortcomings  of  the  one-element  model; 
the  two-element  model  avoids  this  shortcoming. 

The  long-short  model.  In  their  comprehensive  overview  of  paired- 
associate  learning  models,  Atkinson  and  Crothcrs  [7]  proposed  a 
model  based  on  the  distinction  between  long-  and  short-term  stores. 

In  state  L  the  subject  has  the  S-R  association  in  long-term  store 
and  remembers  it.  In  state  S  the  subject  always  responds  correctly, 
but  may  forget  t lie  association  and  drop  back  to  a  guessing  state  F. 
State  F  is  initially  reached  by  'coding'  the  stimulus  element  from 
an  uncoded  state  U;  this  coding  occurs  with  pvrobabili  tv  c.  lie  other 
parameters  of  the  model  are  the  probability  a  that  when  reinforcement 
occurs  the  subject  goes  into  state  L,  and  the  probability  f  that  an 
item  in.  state  S  will  move  back  to  F.  The  transition  matrix  and 


error 

probability  vector  of  the 

model  are 

given  bel"w: 

L 

S 

F 

U 

— 

— 

- 

L 

1 

0 

0 

0 

0 

S 

a 

(l-a)(l-f) 

(l-a)f 

0 

0 

(8) 

F 

a 

(l-a)(l-f) 

(l-a)f 

0 

N-.l 

N 

U 

ca 

c (1-a) (1 - f ) 

c (1-a) f 

1-c 

N-l 

- 

— 

N 

L  J 

The  three-parameter  version  of  this  model  is  referred  to  as  LS-3; 
a  two-parameter  version,  LS-2,  is  obtained  by  setting  c  *  1. 

Atkinson  and  Crothers  point  out  that  this  model  was  constructed  with 
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an  emphasis  on  reproducing  specific  psychological  processes,  though 
the  reason  the  transition  from  S  to  S  should  have  the  same  proba¬ 
bility  as  the  one  from  F  to  S  remains  unclear.  Both  the  LS-3  and 
LS-2  models  fit  the  data  very  well.  Extensions  of  the  LS-3  model 
and  a  trial-dependent  forgetting  (TDF)  model  to  account  for  vari¬ 
ations  in  list  length  are  presented  in  the  Atkinson  and  Crothers 
paper  and  extended  by  Calfee  and  Atkinson  [15].  Rumelhart  [54] 
presented  an  illuminating  overview  and  extensions  of  these  models. 
However,  we  will  discuss  these  variations  no  further. 

A  forgettinn  model.  Bernbach  [11]  proposed  a  three-parameter 
forgetting  model  for  paired -associate  learning.  In  state  C  the  sub¬ 
ject  is  always  correct,  and  in  state  G  he  is  correct  with  probability 
1/N.  Inmediately  after  reinforcement  the  subject  is  in  state  C;  pre¬ 
sumably  if  he  were  immediately  tested  he  would  always  be  correct,  but 
before  the  next  presentation  of  the  stimulus  there  is  a  probability  6 
that  he  will  forget.  If  the  subject  is  in  state  C  with  probability  0, 
he  permanently  acquires  the  S-R  association  and  moves  to  state  C'. 
Finally,  there  is  a  probability  8  that  if  the  subject  guesses  in¬ 
correctly,  he  learns  the  incorrect  response  he  guessed.  If  so,  he 
goes  to  state  E  in  which  his  probability  of  success  is  zero.  The 
forgetting  model  is  represented  by  the  following  transition  matrix 
and  error  probability  vector: 


-103- 


C  C  G  E 


C' 

1 

0 

0  0 

—  — 

0 

c 

0 

(1-0) (1-6) 

(1-0)6  0 

0 

G 

0 

[l-8(~)](l-6) 

5  8(^)(l-6) 

1-1  /N 

E 

0 

(1-8)  d-6) 

6  9(1-6) 

1 

— - 

—  - 

Bernbach  performed  some  experiments  in  which  the  forgetting  model 
does  rather  better  than  the  one*element  model. 

This  completes  our  discussion  of  a  number  of  existing  models  for 
paired -associate  learning.  Wc  now  turn  to  some  new  models. 

Hew  theories  of  paired -associate  learning 

The  Dirichlet  model .  The  name  "Dirichlet"  is  applied  to  this 
model  since  the  generalization  developed  in  Part  IX,  Section  2  uses 
the  general  Dirichlet  density .  The  model  we  shall  now  consider  uses 
the  one-dimensional  version  of  the  Dirichlet  family  known  as  the 
beta  density.  The  intuitive  idea  of  the  model  is  that  the  subject 
can  be  in  any  state  indexed  by  numbers  on  the  interval  [0,1].  If 
the  subject  is  in  state  r{0  5  r  s  1)  on  trial  n,  he  responds  correctly 
with  probability  r,  and  his  state  on  trial  n-M  is  drawn  from  a  beta 
density  on  the  interval  [r,l].  Figure  1  illustrates  this. 

Let  us  state  the  assumptions  more  explicitly: 

1.  The  state  the  subject  is  in  on  trial  n  is  indexed  by  a  real 

number  r  such  that  0  s  r  l  ■ 
n  n 

2.  If  the  subject  is  in  state  r^ ,  he  responds  correctly  with 
probability  r  . 


1 


r  n  +  I 

Fig.  1— The  density  for  r  n , , 
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3.  Let  f(r  -  I r  )  be  the  density  for  r  ,  given  r  .  Then 
n+1 1  n  n+1  n 


0  if  r  ,  <  r  or  if  r  >  1,  and 
n+1  n  n+1  ’ 


<J0>  £<r»Jrn>  * 


(-*r 


bTctTs) 


ifr  <  r  ..  <  I,  j  >  0,  and  .  >  0. 
n  n+1 


The  function  B(or,B)  is  the  beta  function  of  or  and  B  and  is  defined 


to  equal 


/: 


x®”*  (1-x)^"1  dx. 


4.  On  the  first  trial  r^  *  1/N,  where  II  is  the  number  of  response 
alternatives . 

Theorem  1_.  The  learning  curve  for  the  Dirichlet  model  is  given 
u  n/  \  N-l  .  a  .  n-1 

P(en>  *  ">T  W  • 

Proof:  Denote  the  expected  value  of  r  .  given  r  by  E(r  , |r  ). 
-  n+1  n  n+11  n' 

It  is  an  elementary  property  of  beta  densities  that  the  density 

--7^— --r  xa  *(l-x)®  *  for  0  <  x  <  1  has  expectation  a/o+0.  Hence, 

*HCt,cU 


(11) 


E(r  , |r  )  -  r  -*  -2-  (1-r  ) 
n+11  n  n  o+B  n' 


Now  r  is  itself  a  random  variable.  The  expected  value  of  r  ,  given 
n  r  n+1 

r  is  a  linear  function  of  r  .  But  the  expected  value  of  a  linear 
n  n 

function  of  a  random  variable  is  simply  equal  to  that  same  linear 
function  of  the  expected  value  of  the  random  variable,  i.e., 


‘  E<r„>  +  -2-  ^  -  E<r„>3- 

or+B 


(12) 
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Thus  from  wc  can  find  the  expected  value  of  r0;  from  the  expected 
value  of  rn  we  find  the  expected  value  of  r  ,  etc.  It  follows  that 

6 


(13) 


Since  1  -  r. 


E(  1  -  r  ,)  -  '1  -  -77 1  F.(I  •  )  -  -77  E(l-r  )  . 

n+l'  ,  a+3  n  a+B  n 


\ 


N-i 


„  ,  by  recursion  on  (13)  it  follows  that  E(l-r  )  = 
N  n 


[B/(cH8)]n  ^  •  But  p(en)  Is  simply  equal  to  EQ-r^);  hence, 


(14) 


P(en) 


N-l  /  i 


n-1 


N  V  a+t 


Q.E.D, 


The  quantitv  — —  =  1  -  — ■  represents  the  learning  rate  in  this 
'  a+B  0+9 

model  *  the  learning  curve  generated  is  the  sane  as  for  the  linear  and 

one-element  models.  In  tact .  both  the  linear  and  one-e lement  models 

are  specia 1  cases  of  the  Dirichle  t .  The  linear  model  results  from 
c 

setting  Q  =  ~M—  and  allowing  a  and  .  to  approach  infinity. 

0+9 

a 

The  one-element  model  results  fron  setting  c  »  and  letting  a 
and  3  approach  zero.  The  behavior  of  f(r  )  for  various  values  of  i  Is 


shown  in  Figure  2,  where 


o*>  - 


.25  and  r  ~  .2  . 
n 


We  assume  that  the  subject  fails  to  learn  on  each  trial  with  some 
fixed  probability,  1-r,  but  when,  tic  does  learn,  r  ,  is  given  by  (10), 
which  results  in  a  threc-puranctcr  generalization  of  the  Dir ich lot 
model.  Letting  r  ”  1  gives  the  two-parameter  Dirichlet  .  If  r  =  c  4  1 
and  ~  n  letting  a-  approach  infinity  gives  Norman's  RTI  model  as  a 

special  case  ol  tuc  turcc-parameter  Dsr  icmci  model,  rue  turee-paranet  or 

* 

Diriclilet  model  is  an  example  of  wh.H  Howard  calls  a  'Markov i .m 
dynamic  inference'  model,  with  a  continuous-state  Marki  v  chain. 


Hovard,  R.  A.,  Systems  Analysis  of  Markov  Processes,  to  .evv-ar. 
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Functions  graphed  schematically 


Fig.?-f(r„.  t|rn  ■  .2)  (or  ..25 
and  several  values  of  a 
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The  elimination  mode  1 .  The  basic  assumption  of  this  model  is  that 
the  subject  learns  by  climii.ct log  -esponses  known  to  be  .incorrect.  He 
eliminates  each  response  possible  on  a  given  trial  with  a  fixed  proba¬ 
bility,  e,  independently  of  whether  he  eliminates  other  incorrect 
responses.  More  explicitly,  the  assunpt ions  of  the  mod, !  are: 

1.  If  there  are  1<  response  alternatives,  the  subject  cun  be  in 
any  of  N  states  labeled  from  0  to  N* ,  where  N*  Is  the  number 

of  wrong  responses  (N*  ■  N  -  1).  If  the  subject  is  in  state  I 
(0  •.  1  N*)  ,  lie  has  i  possible  wrong  responses  left  to  eliminate. 

2.  If  the  subject  is  in  state  i,  the  probability  that  he  will 
make  a  correct  response  is  1/i-ll. 

3.  If  the  subject  enters  a  trial  in  state  i,  after  being  rein¬ 
forced  he  eliminates  each  of  the  i  remaining  incorrect  responses  with 
probability  e,  independently  of  the  others  . 

4.  Entering  trial  1,  the  subject  is  in  state  N* . 

/  few  definitions  are  useful  before  deriving  the  learning  curve. 


The  vector  3  ■  (s  ^  ,  s^' 

n  n  n 


•  ,  s 


(0 


s'  )  i-  the  row  vector 
n 


that  gives  the  probability  of  being  ir.  state  i  on  trial  n.  The  tran¬ 


sition  matrix  T  s  f  .  and  response  probability  vector  E 


are 


defined  as  follows. 


(15) 


l( 


1  ! 


\)  el‘J(l-c) 


,  for  0  <  j 
,  otherwise 


t-H  . 


For  N  •  4,  T  and  Y  are  as  follows: 
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(18) 


I  <;>  a: 

k“J 


:)  'v(l-€)?J  (1-Y)‘  ~  (ve) 


i-k  .k-j 


We  now  change  the  index  of  sunvr-'tion  to  a  =  k-j  and  let  d  represent 
i  -  j  .  Therefore , 


(19) 


t: 


(n)  _  A 

ij 


(j)  tva-«>:j  £  (d>  oV"a  (Yc)‘ 


a=0 


(J)  [v(l-e)]J  [(1-y)  -1  ve]d 


(j)  [Y(l-e)]j  [1  -  Yd'®)] 


i-J 


=  (j)  (l-e)nj  [1  -  (l-e)n]i_j  . 


/n\ 

This  completes  the  subsidiary  proof  that  O.  is  given  by 


(  (.)  rl  -  (l-€)n:l*J  ()-e)nJ  if  '  i  j  i  i 

t(n)  J  J 

ij 


(20) 


otherwise . 


Miltiplving  by  Tn  ^  gives  ■  S^Tn  ^ ,  where 


(21)  S<J)  -  (** )  [1  -  (l-€)n_1]^'J  (l-e)(n"1)j 


Multiplying  this  row  vector  !>v  the  column  vector  E,  we  obtain: 

U* 

(22)  p(e  )  =  S  t  j/Jtl  (***>  :i  -  (l-e)"'1]1^'’  (l-€)  (n"1)j 

w-  J 

j=0 


which  can  be  transformed  to: 
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H*  /**\ 


(23)  p(e 


„>=D 

j-o  \  j  / 


H*+l 


“*4M  ,  „  K*+  1-k  ,  <n'1>k 

In  -  (l-e)(n'l)]  (Ue> 


k-1  (l-s)0'1  (N*+ 1) 


(M*4l  \ 


,  n- 1 


N*-fl 


1  -  ~  ^ -  .  Q.E.D. 

(l-e)n_1  (N*+l) 


The  learning  curve  is  the  only  statistic  we  shall  derive  for  the 
elimination  model.  Before  going  on  to  extensions  of  this  model,  we 
should  point  out  the  following:  First,  when  N=2,  the  elimination  model 
is  formally  identical  to  the  one-element  model,  and,  second,  when  N>2 , 
the  model  predicts  increasing  probability  of  success  prior  to  the  trial 
of  last  error  .  Tills  model  is  compared  against  data  presented  by  Atkin¬ 
son  and  Crothers  in  Table  1. 

The  acquisition /elimination  mouels  .  These  models  are  two-  and 
three-parameter  generalizations  of  the  elimination  model.  The  basic 
notion  behind  the  two-parameter  acquisition/eliminat ion  model  (AE-2) 
is  that  there  is  some  probability  c  that  the  subject  learns  the  correct 
response  on  any  particular  trial.  If  he  fails  to  do  so,  lie  eliminates 
incorrect  responses  with  probability  e  as  in  the  elimination  model. 

More  explicitly,  AE-2  makes  the  same  assumptions  as  the  elimination 


model  except  that  Assumption  3  is  changed  io : 
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1 


TABLE  1 


2 

Minimum  X  Va lues  for  Four 


One -Parameter  Models 


c 


Experiment 

* 

One -Element 

* 

Linear 

Elimination 

Conditioning  Strength 

laa 

30.30 

50.92 

15  .03 

8 .11 

lb3 

39.31 

95  .86 

17  .63 

14  .41 

III3 

62.13 

251.56 

32.71 

31 .80 

IIIb 

150.66 

296  .30 

101.11 

95  .26 

ivb 

44.48 

146  .95 

31.76 

39.37 

Vab 

102.02 

201.98 

56.52 

53  .74 

Vbb 

246.96 

236.15 

97  .50 

85  .69 

Vcb 

161.03 

262.56 

117.76 

90.26 

Total 

836  .89 

;  1542.02 

L  . 

470.02 

1 _ . 

418.64 

a 

Three-response  alternatives. 

^Four-response  alternatives. 

c  2 

Total  x  for  other  models:  2-parameter:  RTI ,  284.19;  2-phase 
493.59;  LS-2,  147.16;  3-parameter:  LS-3,  137.26;  2-element, 
259.56. 

A 

Data  from  Atkinson  and  Crothers  [7], 


l 
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3'.  If  the  subject  is  in  any  state  i,  then  after  he  is  reinforced 
lie  acquires  the  correct  response  with  probability  c.  If  he  fails  to 
acquire  the  correct  response  then,  with  probability  e,  he  eliminates 
each,  of  the  i  remaining  incorrect  responses,  independently  of  others. 

The  following  transition  matrix,  T1  =  characterizes  AE-2: 


c  -l  (1-c)  (e)  ,  for  j  -  0 


(24) 


i  ’ 


(l-c)(b  e1"1  (l-«)\  tor  1  i  j  i  i  s.  K* 


0. 


otherwise  . 


For  N=4 ,  the  matrix  is: 


0 

1 

(25)  T’  = 

l- 

3 


0 

1 

c+(l-c)e 
c+(l-c)e2 
c+( 1-c) f  ^ 


1 

0 

(1-c)  (l-€) 

2 (1-c) (e)  (1-c) 
3(l-c)e2(l-e) 


2 

0 

0 

(l-c)d-e)2 

3(l-c)e(l-e)2 


3 

0 

0 

0 

(1-c)  (1-c)3 


-J 

Model  AE-2  reduces  to  the  elimination  model  if  c^O;  it  reduces  to  the  one- 
element  model  if  e=0  or  N®2 .  It  can  be  extended  to  three  parameters  (AE-3) 
by  assuming  that  when  the  subject  learns  an  association  (with  probability  c) 
lie  may  pick  up  several  more  than  just  the  correct  one.  The  number  he  ac¬ 
quires  is  binomially  distributed  with  parameters  a  and  i,  i  being  his 
state  index.  For  example,  if  the  subject  is  in  state  i  and  it  is  given 
that  he  learns  on  a  particular  trial,  then  with  probability  o'  he  acquires 


just  the  correct  response.  Intuit ivel 


v  should  ho  close  to  one.  The 
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assumptions  of  AE-3  are  the  same  as  those  of  the  elimination  model  and 
AE-2  except  that  we  substitute  3"  for  3': 

1".  If  the  subject  is  in  state  i  at  the  beginning  of  a  trial 
then,  when  reinforced,  with  probability  c  he  acquires  the  correct  re¬ 
sponse  and  up  to  i  incorrect  responses.  He  selects  the  number  acquired 
with  a  binomial  distribution  with  parameters  a  and  i.  With  probability 
1-c  the  subject  acquires  nothing,  but  he  eliminates  incorrect  responses 
independently,  each  with  probability  e. 

The  transition  mati ix  for  AE-3,  T"  =  [tV  1,  is  given  in  component 
form  by 


(26) 


t". 

U 


|  c(j)  aX'j  (l-o)J  +  (1-c)  (j)  (e)i_j  (l-e)j  for  0  s  j  i  U  N* 
|  0,  otherwise  . 


If  o*l,  AE-3  reduces  to  AE-2;  if  c«=0  or  c=l ,  AE-3  reduces  to  the 
simple  elimination  model.  The  chief  motivation  for  the  AE-3  model  is 
that  it  can  give  a  bimodal  transition  distribution,  which  the  binomial 
distribution  in  AE-2  cannot  do. 

An  elimination  model  with  forgetting  ■  In  the  incorrect -response 
elimination  models  discussed  so  far,  there  has  been  no  provision  for- 
regressing  to  a  state  in  which  the  subject  responds  from  mor c  wrong 
responses,  that  is,  for  forgetting.  It  is  plausible  to  assume  that 
during  the  intertrial  interval,  after  the  subject  has  eliminated  per¬ 
haps  several  incorrect  responses,  he  might  forget  which  ones  he  Had 
eliminated,  thus  introducing  some  more  wrong  responses.  The  basic 
assumption  of  this  forgetting  model  is  that  the  responses  learned 
previously  to  bo  incorrect  are  reintroduced,  independently  of  one 


i 
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another,  with  some  probability  6.  More  explicitly,  the  assumptions 
are: 

1.  If  there  are  N  response  alternatives,  the  subject  can  be 
in  any  of  N  states  labeled  from  0  to  N*,  where  N*  *>  N-l.  If  the 
subject  is  in  state  i(Q  <.  i  <.  N*) ,  he  has  i  possible  wrong  responses 
left  to  eliminate. 

2.  If  t lie  subject,  is  in  state  i,  the  probability  that  he  will 
make  a  correct  response  is  l/i-!l. 

3.  If  the  subject  enters  a  trial  in  state  i,  after  being  re¬ 
inforced  he  eliminates  each  of  the  i  remaining  incorrect  responses 
with  probability  e,  independently  of  the  others. 

4.  Unless  the  subject  is  in  state  0,  between  trials  he  forgets 
each  response  previously  learned  to  be  incorrect  with  probability  6, 
independently  of  the  others.  If  the  subject  is  in  state  0,  he  stays 
there . 

5.  When  the  subject  enters  trial  1,  he  is  in  state  N*. 

The  subject  enters  trial  1  with  state  probability  vector  » 

(0,  0,  . ..,  0,  ...,  1)  by  Assumption  3.  Shortly  after  reinforcement, 
the  subject  has  state  probability  vector  given  by: 

(27)  SI  «  SXT, 

where  T  is  the  transition  matrix  giver,  by  (15)  .  During  the  intertrial 
interval  t lie  subject  may  forget;  his  forgetting  or  reintroduction  is 
represented  by  a  matrix  F  that  operates  on  S^.  F  «  [f..]  is  given  by: 
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TIius  S0  =  S,'F  =  S^TF .  Or,  more  generally, 

(30)  S  -S  (TF)""1, 

n  i 

and 

(31)  -  Sj  (XF)n  lp;  • 

Clearly  this  forgetting  model  could  be  generalized  by  replacing 
T  with  T’  (24)  or  !>v  T"  (.!(-)  . 

A  conditioning  strength  mode  1  ■  Atkinson  [6]  suggested  a 
generalization  of  stimulus-sampling  theory  that  embodies  the  notion 
of  'conditioning  strength',  Kac'.  response  alternative  has  associated 
with  it  a  conditioning  strength;  the  total  available  amount  of  con¬ 
ditioning  strength  remains  constant  over  trials.  The  probability  that 


» 


any  given  response  will  be  made  is  its  conditioning  st  ro-’gt  !•  divided 
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by  the  total  available.  Our  model  specializes  Atkinson's  work  to 
paired-associate  learning  and  generalizes  it  to  include  richer  ways 
of  redistributing  conditioning  strength  after  reinforcement.  The 
assumptions  of  our  model  are: 

1.  If  there  are  N  response  alternatives,  the  subject  can  be  in 
any  of  N  states.  If  t lie  subject  is  in  state  i,  (Osi<:N-l)  ,  the  con¬ 
ditioning  strength  oi  the  correct  response  is  N-i.  The  total  available 
conditioning  .trength  is  N. 


2.  The  7  obabilrty  oi  a  correct  response  is  equal  to  the  response 
strength  of  the  correct  response  divided  by  total  response  strength. 
That  is  to  say,  if  the  subject  is  in  state  i,  his  probability  of  being 

correct  is  and  the  probability  of  being  incorrect  is  4  • 

N  N 

3.  If  the  subject  is  in  state  i  on  trial  n,  on  trial  n+1  he  can 
be  in  any  state  between  i  and  0;  which  state  lie  enters  is  given  by 

u  binomial  distribution  with  parameters  i  and 
4  .  On  trial  1 ,  i  =  N-l . 

The  transition  matrix  of  this  model  is  identical  to  that  of  the 
elimination  model;  all  that  differs  is  the  response  probability  vector. 
The  matrix  and  response  probability  vector  arc  shown  below. 


(32)  T*  «* 


0 

1 

1 

0 

(l-o) 

(v)2 

2a(l-c) 

(a)3 

/*% 

O 

1 

r-  4 

v-/ 

n 

v."7 

r-'i 

(o)N_1 

2  3 

0  0 

0  0 

(l-o)2  0 


3a  (1  -o)  2  (1-V)3. 


.N-l 

.  0 

(i/N 

.  0 

1/N 

.  0 

E*  = 

2/M 

.  0 

>s-‘ 

N-l 

L  N_ 

N-l 
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Thc  learning  curve  is  niven  by: 


(33) 


p(e  )  -  S  T*(U_1)  E* 
n  i 


?*(n  1)  £g  givcn  by  (20)  and  T*  ^  by  (21)  where  N*  must  be  re¬ 


placed  by  N-l.  14i  It  ip  lying  SjT* 
N-l 

(34)  p  (e 


(n-l)  , 


=„>  '/N  ( 


K:1)  a-a)"'1 


by  E*  ,  we  obtain 
N-l  -  i 


(1-a) 


j (n-l) 


Ignoring  N  in  the  denominator,  what  romahis  is  the  expression  for  the 
expectation  of  a  binomial  density  with  , urometers  N-l  and  (1  -cr) n  ^  . 

As  this  expectation  is  (N-l)  (l-o)n  ^ 

(35)  P(en)  =  If  (1-»)n‘1 

which  is  the  same  learning  curve  as  that  for  the  linear  and  one-element 
models . 

Clearly  two-  and  three-parameter  generalizations  of  the  conditioning 
strength  model  are  obtained  by  using  the  matrices  given  in  (24)  and  (26) 
instead  of  T* . 

Comparison  of  the  one-parancter  elimination  and  condlt lonljig  strength 

model*.  Atkinson  and  Crothers  [7]  presented  results  from  eight  PAL 

experiments,  in  which  three  have  three  response  alternatives  and  five 

have  four  response  alternatives.  Parameters  are  estimated  by  a  minimum 
2 

X  technique  from  the  16  possible  sequences  in  the  data  of  correct  and 
incorrect  responses  on  trials  2  to  5.  Atkinson  and  Crothers  give  results 
for  many  models;  their  results  for  the  linear  and  me-elcment  models  are 
shown  in  Table  1  (aee  p.  16).  Also  shown  in  Tabic  1  arc  the  iesulrs  we 
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obtained  for  tlic  one-parameter  elimination  and  conditioning  strength 
models.  Table  2  shows  the  parameter  estimates.  Our  theoretical  pre¬ 
dictions  were  obtained  by  computer  simulation. 


TABLE 


-> 


Parameter  Estimates  Cor  Four  One-Paramet er  Models 


Experiment 

*“r - *-> 

One-Element  Linear 

Conditioning  Strength 

1 

V 

Or 

la 

.133 

.414 

.50 

.55 

lb 

;  .328 

.328 

.56 

.60 

II 

.2  J 

.289 

.  .9 

.6  9 

III 

.203 

.253 

.61 

.70 

IV 

.281 

.297 

.52 

.66 

Va 

.125 

.164 

.74 

.84 

Vb 

.172 

.250 

.62 

.70 

Vc 

.289 

.336 

r.  0 

*  -><•  i 

.66 

DatJ  irom  Atkinson  and  Crothers  [7], 
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2.  Palred-Asgoclate  Learning  with  Incomplete  Information :  Noncon¬ 
tingent  Case 

The  general  structure  considered  in  this  subsection  is  paired- 
associate  learning  with  multiresponse  reinforcement.  We  deal  here  with 
noncontingent  reinforcement,  and  then,  in  the  next  subsection,  we  deal 
very  briefly  with  reinforcement  contingent  on  the  subject's  response. 

On  each  trial  the  subject  responds  with  one  of  N  alternatives.  He  is 
then  reinforced  with  a  subset  of  these  N  alternatives  consisting  of  the 
one  correct  response  and  D  distractors,  of  cardinality  A  in  all  (where 
A  -  D+l) .  If  A  is  one,  then  the  paradigm  is  exactly  that  of  determinate 
reinforcement  Just  considered.  If  A  is  greater  than  one,  then  on  any 
single  trial  the  subject  cannot  rationally  determine  the  correct  re¬ 
sponse.  On  each  trial  the  correct  response  is  reinforced.  The  D  dis¬ 
tractors  are  selected  randomly  on  each  trial  from  the  N*  possible  wrong 
responses.  Thus  over  trials  the  correct  response  will  be  the  one  re¬ 
sponse  which  is  always  reinforced.  The  subject's  task  is  to  make  as 
many  correct  responses  as  he  can  and  to  learn  the  correct  response  as 
quicklv  as  he  can. 

Normative  model .  Olven  the  thovr  paradigm,  lor  some  of  the  ex¬ 
tensions,  it  la  necessary  to  make  predict. ons  about  the  optimal  be¬ 
havior  of  a  subject  with  perfect  memory.  Perfect  memory  of  the  entire 
reinforcement  history  is  not  required  for  normative  behavior.  If  o.. 
each  trial  the  reinforcement  gets  ere  intersected,  then  only  the  re¬ 
sulting  intersection  needs  to  be  re®"mbered.  Thue  if  on  triel  1  the 
subject  is  told  that  the  correct  response  is  aa»ong  b,  c,  end  d, 
where  A  is  4,  and  on  trial  2  that  the  correct  response  is  among  a,  b, 
i  and  e,  he  need  onlv  remember  a,  h,  and  c,  the  taemhers  of  the  ln- 
irfjf  t  ion  a*  trltl  ).  He  then  Intersects  this  with 


f;.e  n*  .i  :  einf orremant  ?”*■ .  Successive  reinforcements  and  it  erne:- 
tiona  eventually  wiu  *eao  tc  tne  correct  response.  The  task  now  is 
o  describe  t>.»  'even  uaiiy'. 

Let  N  states  0.1,,.., N*  be  defined  a®  for  the  elimination  models 
in  Part  il,  Section  1.  Thus  state  i  is  the  state  of  having  i  wrong 
responses,  plus  the  correct  one,  that  remain  in  the  intersection  on  a 
given  trial  immediately  before  making  a  :sponse.  The  subject  responds 
from  this  set  of  i+1  responses,  and  then  is  shown  A  reinforcera.  Since 
the  subject  is  assumed  to  be  acting  normativelv  he  intersects  the  new 
reinforcement  set  with  the  old  intersection  and  remembers  the  resulting 
intersection  until  the  next  trial.  The  number  of  wrong  responses  now 
in  memory  is  the  cardinality  of  the  intersection  minus  1,  and  this  is 
the  number  of  the  state  in  which  the  subject  enters  the  next  trial. 
Obviously  j,  the  index  of  this  new  state,  cannot  be  greater  than  i, 
which  after  the  first  reinforcement  cannot  be  greater  than  D. 

Letting  NN  "  nn^  be  transition  matrix  for  the  normative 
model,  the  general  expression  follows  immediately  by  considering  the 
transition  from  state  i  to  J  as  the  event  of  exactly  j  out  of  the  D 
reinforced  distractors  being  among  the  1  distractors  In  the  previous 
intersection. 

Thus  we  obtain 

(  D  (i). (N*-i) 

I  (.)  — 1  • * - * —  for  C  <  j  <  i  <  N*  and  j  <  D 

(36)  nn^  -  <*  J  (N*>D  "  “  “ 

\  0,  otherwise 

where  (a)b  «  (*)  •  b!  -  a(a-l)  •  •  •  (a-b+1). 


The  normative  transition  matrix  and  «rror  vector  for  A*  2  are  given  ab 


an  xample. 


(37)  "V2 


0  i  2  ...  H* 


o  1  0  0  . . .  0 

0 

1  N*-l  1  0  ...  0 

K*  N* 

l 

f 

•  •  *  •  • 

•  •  •  •  • 

.  E  - 

* 

*  •  •  •  • 

.  H*-i  1 

N*  N*  J  •••  0 

1 

1+1 

*  *  •  •  • 

•  •  •  •  a 

•  •  a  •  « 

• 

M*  0  1  0  0 

— 

H* 

M  _ 

If  Sq  is  the  state  probability  vector  as  before,  and  the  subject  again 
enters  trial  1  in  state  N*,  by  virtue  of  intersecting  the  reinforce¬ 
ment  subset  with  the  entire  set,  the  subject  must  enter  trial  2  in 
state  D.  Thus 


(38) 


1  ,  if  j  -  D 

0  ,  otherwise  . 


This  equation  also  can  be  obtained  directly  from  the  transition  matrix 
in  (37). 

Although  states  A  through  N*  are  irrelevant  except  tor  entering 
state  N*  on  the  first  trial,  they  will  be  needed  later,  and  thus  for 
convenience  are  Introduced  here. 

The  equation  for  the  state  vector  ie  given  below: 

(39)  S  -  S.NN*"1  . 

n  1 

Letting  S  •  [S',  S"j  with  the  partition  after  column  D,  and  letting 
nun  ° 

F  NN  *  0 1 

NN  ■  ,  with  the  partition  after  column  and  row  D,  we  obtain 


We  now  derive  the  normative  learning  curve.  As  before,  tne  prob¬ 


ability  of  an  error  on  trial  n  is  fourd  by  multiplying  and  E;  thus 


(41)  P*(en)  “  SnE  -  ^  nn^  ^  ^ 


The  powers  of  the  NN  matrix  for  A  ■  2  given  in  v37)  are  readily 
found,  and  an  explicit  solution  to  the  learni-6  curve  is  possible.  The 
power  of  the  matrix  with  the  extra  states  eliminated  is  given  below. 


(42)  (NN 1 )n  • 


0 

1 


1 

0 


Thus  the  learning  curve  tnd  total  errors  are  obtained: 


(43) 


,  n»l 

,  n®2 , 3 , . • * 


Jt,  N*(3N*-1) 

(44)  E(total  errors)  -  P(en)  -”2N(H*.1) 

This  analytic  solution  for  A-2  is  given  only  as  an  example;  numer¬ 
ical  solutions  for  several  specific  N,A  pairings  are  included  in  Part 
Four/Two.  They  are  used  there  to  compare  real  subject  performance  with 
the  normative  model. 

At  this  point  extensions  of  some  models  which  do  not  reduce  to  the 
normative  one  are  discussed.  The  normative  model  will  be  used  later  in 
extensions  of  other  modexs. 


Qne-ulemert  model .  Several  extensions  of  the  one-element  model 
outlined  in  Part  II,  Section  1  are  possible  and  are  considered  here. 

An  alternative  generalisation  is  discussed  later  as  a  special  case  of 
another  model.  The  assumptions  of  this  version  of  the  one-element 
model  are: 

1.  On  each  trial,  the  subject  is  either  unconditioned  or  condi¬ 
tioned  to  exactly  one  of  the  N  response  alternatives.  The  unconditioned 
state  will  be  denoted  C;  the  state  of  being  conditioned  to  the  correct 
response  will  be  denoted  C;  and  the  state  of  being  conditioned  to  any 

of  the  N*  incorrect  responses  will  be  denoted  W. 

2.  If  the  subject  is  in  state  C,  he  makes  each  response  with  a 
guessing  probability,  1/N.  Otherwise  he  makes  Che  response  to  which 
he  is  conditioned. 

3.  On  any  given  trial,  with  probability  1-c,  the  reinforcement 
is  ineffective  and  the  state  of  conditioning  is  unchanged.  With  prob¬ 
ability  c  the  reinforcement  is  effective.  With  effective  reinforce¬ 
ment,  if  the  subject  is  in  state  C,  he  conditions  with  equal  likelihood 
to  any  one,  but  exactly  one,  of  the  A  reinforces.  If  he  is  in  a  con¬ 
ditioned  state  and  the  response  to  which  he  is  conditioned  appears  in 
the  reinforcement  set,  he  remains  conditioned  to  that  response.  If  the 
response  does  not  appe.ar,  and  if  the  reinforcement  is  effective,  he 
rejects  the  response  to  which  he  was  conditioned  and  becomes  conditioned 
to  exactly  one  of  the  responses  reinforced  on  that  trial. 

4.  Entering  trial  one,  the  subject  is  in  state  C.  Thus  for  the 
one-element  model  the  transition  matrix  and  error  vector  are: 


c 

W 

c 

C 

r  i 

0 

0 

0 

(45) 

W 

N*-D 

C  N*A 

.  c(N*-D) 

1  N*A 

0 

.  E  “ 

1 

1 

D 

1  -r 

N* 

V 

C  A 

c’a 

1C 

IT 

_ 

— 1  — 

By  raising  the  transition  matrix  to  the  (n-l)st  power,  the 
learning  curve  and  expectation  for  total  errors  are  found  to  be  as 
follows: 


(46) 


P(«  ) 

n 


N* 

*T 


,  c  (N*-D)n_1 
1  'T“TPr- 


and 


(47)  E(total  errors)  -  |*  * 

No  other  statistics  will  be  derived.  The  most  obvious  test,  however, 
is  not  the  learning  cnrv*  Jt«elf3  K-it  the  prediction  of  the  run  of 
errors  while  the  subject  is  in  state  W.  Once  the  subject  moves  out  of 
state  C,  no  successes  are  predicted  until  he  learns. 

It  should  be  noted  that  the  one-element  model  does  not  reduce  to 
the  normative  model  for  any  value  of  c.  As  c  Increases,  the  probability 
of  conditioning  wrongly  Increases  at  the  same  rate  as  the  probability 
of  conditioning  correctly. 

An  Interesting  extension  of  the  one-element  model  has  been  worked 
out  for  A  that  varies  In  size  on  each  trial  fror-  1  to  N,  with  proba¬ 
bility  that  A  ■  a.  The  basic  assumptions  of  the  model  are  the  same, 
but  the  state  transition  probabilities  are  altered  by  the  experimental 
change.  Let 


Then,  if  the  learning  curve  ia  analogous  to  that  with  constant  A,  we 
should  expect 


....  N*  „n-l 

(49)  S  >  • 


We  now  prove  this.  Let  M  ■  P(W  jc  ).  So  P(C  ,  f C  )  remains  1-c. 

n+1  n  n+1 1  n 

Then  by  raising  the  transition  matrix  to  the  (n-l)st  power,  and  as¬ 
suming  the  subject  starts  in  state  C, 


(50) 


p(wn) 


Hl(l-c)11'1  -  B11"1] 
1-c-B 


But,  since 


(51) 

nud 

(52) 


» *  £  '.<1-C(St:»  *  1-fnw  £t»-  »• 

M  -  jjS,  c(l-i)  -  . 


(53)  P(Wn)  -  ^—((l-c)""1  -  Bn-1]. 


Thus, 


(54) 


’«  '  F(V  +Ti  P(V 


P(W  )  +~±(l-c)n'1 
n  w 


N-l  n-1 
N 


Linear  model*.  Let  p_  •  (p,  p,  •  •  • »  Pv,  _ )  represent  the 

.  *  -  *»  X|U  m  |  U  W  |li 

reaponee  probability  vector  on  trial  n.  That  is,  p^  Q  ia  th*  proba¬ 
bility  of  making  the  ith  reeponse  on  trial  n.  N  is  the  number  of  re¬ 


sponse  elternativee.  A  linear  model  for  learning  asserts  that  P 


is  a  linear  fun‘_^ton  of  p  ;  the  exact  nature  of  that  linear  function 

n 

depends  on  the  reinforcement-  Consider  as  an  example  a  situation  with 
the  two  response  alternatives,  a^  and  a^,  where  is  always  correct. 
The  linear  model  for  this  situation  is  represented  by  a  transformation 
matrix,  L  ■  [l^J  *  [j-e  qJ*  vector  is  given  by  the  follow¬ 

ing  expression: 


(53)  (pl,n+l’  p2,n+i)  "  (pl,n’  P2,n)  [l-0  9_|. 

Thu  elements  of  the  matrix  L  clearly  must  be  independent  of  p^  or 
the  model  would  be  nonlinear.  For  learning  _o  occur,  9  must  be  greater 
than  a. 

Ir.  th.:  example  above  only  one  reinforcement  is  given  (i.e.,  this 
is  the  situation  considered  in  Part  II,  Section  1).  hence,  only  one 
fsrri  Tn  general  the  fruno^iw  matrix  must  be  Indexed 

by  the  reinforcement  E.  The  class  of  all  linear  models  corresponds  to 
the  class  of  all  transition  matrices  L(E)  •  [^(E)]  such  that: 

(56)  t  (E)  i  0  for  1  ^  i,j  <  N 


and 


(57) 


£ 


ttJ(E)  -  1  for  1  <  i  <_  N, 


where  E  is  a  particular  relnf.  cement.  A  linear  model  specifies  for 
etch  reinforcement  E  s  matrix  L(E)  such  that 


<58>  I  VE)  ’  L(E>' 
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as  well  as  a  starting  vector,  p.  Without  placing  further  constraints 
on  L,  we  have  an  N  x  (N-l)  parameter  model.  To  pare  these  down  to  a 
single  parameter,  we  make  four  further  assumptions.  The  first,  third, 
and  fourth  assumptions  seem  indispensable;  relaxing  the  second  would 
give  a  somewhat  more  general  model.  The  assumptions  are  these. 

1.  Relabeling  the  response  alternatives  in  no  way  affects  the 
predictions  of  response  probabilities. 

2.  If  a.e  E  ,  where  E  is  the  subset  of  the  response  alternatives 

inn  r 

in  the  reinforcement  set  on  trial  n,  then  l^(En)  “  1*  In  the  example 
of  (55),  this  corresponds  to  assuming  a  *  0  instead  of  simply  assuming 
a  <  0. 


3.  If  r.e  E  then  p.  >  p. 

i  n  ri,n+l  ri,n 

4.  -  (1/N,  1/N,  ....  1/N). 


The  preceding  assumptions  limit  us  to  two  distinct  one-parameter 
models.  To  see  this,  consider  the  N-4  with  A»2  case.  For  convenience 
we  consider  that  the  first  response  is  correct,  l.e.,  it  is  always  in 
the  reinforcement  set.  Each  of  the  remaining  three  responses  appearr 
in  the  reinforcement  set  with  probability  1/3.  The  two  possible  rein¬ 
forcement  matrices  for  when  ,  first  (correct)  response  and  the  second 
response  sre  reinforced  are  given  by: 


o 

o 

o 

1 

J 

o 

o 

o 

_ 1 

L(1)  - 

0  10  0 

and  L<2>  - 

0  10  0 

a/2  a/2  1-a  0 

a/3  a/3  1-a  a/3j 

a/2  a/2  0  1-a 

a/3  a/3  a/3  1-a! 

!•»*■■»  — J 

The  values  of  the  first  two  rows  follow  from  Assumption  2.  Since  the 

models  are  linear,  none  of  the  p  ,  esn  appear  in  the  matrices.  From 

n ,  l 
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Assumption  3,  0  <  a  <_  1  and,  from  Assumption  1,  the  cinstant  a  appearing 

ir  row  3  must  have  the  same  value  as  the  a  in  row  4.  The  transition 

matrix  L  ^  follows  if  we  assume  that  the  decrement  in  any  response 

alternative  not  reinforced  is  spread  evenly  among  those  that  are  rein- 

(2) 

forced;  L  follows  if  we  assume  that  the  decrement  is  spread  among 
all  the  rest. 

Let  us  now  deriv#  the  learning  curve  for  the  transition  matrix. 

As  there  are  three  equiprobable  reinforcemen. s ,  and  again  assuming  that 
response  1  Is  correct, 

pl!«l  •  1/3  <*  '  h.a  +  0  '  p2.n  +  5  '  p3,n  *  2  '  p4,„> 


(60) 


+  1/3  (1  •  p.  +  2  •  p.  +  0  •  p.  +  2  •  p.  ) 
rl,n  2  r4.,n  3,n  2  4,n 


+  1/3  (1  *')  ,*!  '  p2,n  +  1  ‘  p3,n  +  0  '  p4,n’ 


or 

(61) 


AD  -  „ 

l,n+l  P1 


+  2  (i  -  Pl  ). 

,  n  3  rl,n 


From  this  recursion  and  Aaaunption  4,  it  follows  that 


(62)  p(1>  -  1 

i,n 


?  (1 


-  ?,"i . 

J  J 


(2) 


Using  similar  arguments  with  the  L  transition  matrix,  ve  find  that 


(63) 


>!2) 

1 ,  n 


1  - 


?  « -  r> 


n-1 


These  results  generalise  to  arbitrary  N  and  A.  We  continue  to 
eesume  that  response  1  is  correct.  The  following  recursion  give* 

Pl,n+1 

(64)  p. 


’l.n+l  “  Pl,n  +  ^  pi,n 


•  j  •  K  •  L  , 
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where  L  is  Che  probability  of  each  reinforcement  set,  K  is  the  number 

of  times  each  p.  appears  in  the  gcnerali?; .  tion  of  the  sum  given  in 
x ,  n 

(SO)  and  J  is  the  number  bv  which  a  must  be  divided.  This  i. umber 


depends  on  whether  the  decrement  is  spread  among  'll  response  alter- 

N-i 
'A-X 


N-l  "I 

natives  or  only  among  these  reinforced.  L  is  equal  to  (,  .)  ;  K  equals 

A-  A 


^);  and  J  equals  A  oe  N-l 

Under  the  assumption  that  the  decrement  is  spread  only  among  those 
reinforced,  i.e.,  J*A,  the  learning  curve  is: 


(65c) 


,(1> 

i,n+l 


r 


Jtl  n  _  g  N-A. 
N  V  A  N-l; 


n-l 


Under  the  alternative  aasumption,  J  «  N-l.  the  learning  curve  is: 


(65b) 


,<2) 

l.n 


P,'  '  "  1 


f“ 


(1  _ 

(N-l)2 


n-l  1 


Before  ve  leave  the  linear  modela,  ccnaider  a  geometric  interpre¬ 
tation  for  the  N«3,  A»2  caae  (in  which  it  makes  no  difference  whether 
the  decrement  la  spraad  to  only  those  reinforced  or  to  all).  The  tri¬ 
angle  ABC  in  Figure  3  represent!  all  poasible  v<.)  aa  of  p^,  one  partic¬ 
ular  value  la  shown.  Assume  that  responses  1  and  2  ars  reinforced. 

Let  S  b«  the  point  on  the  line  AB  such  thst  the  vectors  S  -  p  «re 

r\ 

perpendicular  to  AB.  Then  the  linear  matrix  model*  previously  devel¬ 
oped  are  equivalent  to  the  geometric  assertion  that  p  *  p  +  j(S  -  p) 

a  n 

Thus  the  area  of  triangle  Ap^B  is  decreased  by  „  fixe’  f. action,  whereis 
in  the  determinate  case,  a  length  waa  deersaaed  by  a  fraction  a. 


Fig. 3— Geometric  interpretation  of  the  linear  models 


# 


f 


General  Dirichlet  model .  As  before,  there  are  N'  response  alter¬ 
natives,  A  of  which  are  reinforced  on  every  trial.  Gne  of  the  A  is 

correct;  the  remaining  A-l  are  chosen  randomly  from  the  N-l  incorrect 

*  ..  v  (n)  /  (n)  (n)  (n).  .  , 

responses.  Let  the  vector  r*  (r^  ,  r^  .  r^  )  give  the 

probabilities  of  making  various  responses  on  trial  n.  Clearly, 

(66)  r)n^  ■»  1  and  r.  >  0  for  0  <  j  <  N. 

$51  *  J  ~ 

Let  R  be  the  set  of  all  possible  vectors  ;;  K  is,  then,  a  simplex 

in  N-space.  Our  purpose  first  is  to  describe  qualitatively  the  effect 

of  reinforcement  on  r^.  The  vector  r^n will  be  some  point  in  the 

A-dimensional  simplex  in  R  whose  pointB  are  linear  combinations  of 

i  ^  and  the  unit  vectors  corresponding  to  the  responses  reinforced. 

(n+1)  * 

The  simplex  generated  by  r  is  denoted  A  .  Figure  4  shows  the 

case  N"3,  A- 2  when  responses  1  and  3  are  reinforced. 

The  basic  assumption  of  the  general  Dirichlet  model  is  that  the 

value  of  r^n+^  given  r^  is  a  random  variable  distributed  according 

* 

to  ar.  A-variate  Dirichlet  density  over  che  region  A  .  A  further  as¬ 
sumption  is  that  this  density  is  symmetric  with  respect  to  the  responses 
reinforced.  More  expllcity,  the  assumptions  of  the  theory  are: 

1.  The  state  the  subject  is  in  on  trial  n  is  indexed  by  a  vector 
r^  -  (r^n\  r^ ,  ...»  r^)  whose  components  are  such  that  Equation 
(66)  is  satisfied. 

2.  If  the  subject  is  in  state  r^n\  he  makes  response  i  with  prob¬ 
ability  r^n\ 

3.  The  density  for  r^n+^  given  is  an  A-variate  Dirichlet 
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density  over  the  previously  defined  region  A  with  parameters  a^,  a^, 

...f  a.,  and  0.  (See  Wilks  [73]  for  a  general  discussion  of  the 
A 

Dirichlet  density.)  Further,  •  a  for  1  <_  i,  J  <  A. 

4.  r(1)  -  (1/N,  1/N . 1/N). 

The  A-vsriate  Dirichlet  density  is  defined  on  the  standard  region 

X  such  that  x^  0  for  1  <  i  <  A  and  1.  The  algebraic  tangle 

involved  in  translating  the  region  X  into  the  region  A  may  be  avoided 

by  considering  only  the  marginal  density  for  the  probability  of  the 

(n\ 

correct  response  (which  probability  will  be  denoted  ).  Consider 

* 

Figure  5.  The  region  DEC  is  the  straight  on  projection  of  A  (from 
Figure  41  onto  the  r^  -  r^  plane.  The  region  X  is  the  region  BDC. 

Let  the  correct  response  be  3.  All  we  need  know  is  the  marginal  den¬ 
sity  along  the  line  DE.  From  Wilks  [1962,  Th1B  7.7.2),  we  find  that 
in  this  case,  with  A  -  2,  the  marginal  is  a  beta  density  with  para¬ 
meters  a  and  a  +  0  and  hence  with  expectation  a/ (2a  +  8').  In  general, 
the  marginal  distribution  Is  s  beta  distribution  with  parameters  a 
and  (A'i)a  +  0  and  hence  with  expectation  Qr/(Ao-  +  8)  • 

From  here  the  derivation  of  the  learning  curve  strictly  parallels 
the  develop  ent  in  Subsection  1.2. 

(67)  E(^D+i))  -  E(r*o))  +~[1  -  E(r<n))]. 

Repeating  the  arguments  of  Part  II,  Section  1  we  find,  for  A  «  N: 

(68)  p(Vl)  .  «-!/». 

Notice  that  for  fixed  er,  8,  and  N ,  increasing  A  decrease*  the  learning 
rate,  as  it  should. 
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L'i  Imlnation  snode  is.  All  e  imlnation  mode's  generalise  similarly 
from  the  models  in  Part  II,  Section  1.  First  the  generalization  of  the 
one -parame ter  model  is  given,  follov/ed  by  the  others  in  less  detail. 

The  assumptions  for  the  one-parameter  elimination  model  are  the 
same  as  those  in  Part  II,  Section  1  with  the  condition  that  the  subject 
can  eliminate  only  the  responses  on  a  trial  that  have  been  shown  to  be 
incorrect.  Wit!,  determinate  reinforcement  this  cu.-ition  could  be 
Introduced,  but  it  would  be  inconsequential  because  on  every  trial  it 
is  possible  to  transit  to  state  0,  that  is,  it  is  possible  to  eliminate 
all  wrong  responses.  With  mul tires ponse  reinforcement  the  subject 
cannot  always  eliminate  all  wrong  responses.  If  the  subject  narrows 
the  correct  response  down  to  a,  f,  or  g,  and  is  shown  a  reinforcement 
set  of  a,  b,  e,  and  g,  then  the  best  lie  can  do  is  to  eliminate  f.  Tims 
the  nev  transition  probabilities  are  tied  to  the  normative  transition 
probabilities  for  the  estimates  of  the  best  possible,  or  normative, 
move.  More  explicitly,  letting  TT  •  [tt  ]  be  the  multiple  response 
el imtnation  transition  matrix, 


(69)  tt  .  ,  -  P(state  ,  ■  j  state  ”  i) 
ij  n+1  J 1  n 


P (state  ,  *  1  state  -  i,  normative  =  k) 
n+1  1  n 


■P(uurm.  -k  slate  -  t)  . 
1  n 


Hie  sum  is  only  to  j,  as  the  subject  can  mc/e  no  inrt  ier  than  the 
normative  move.  The  second  term  in  the  sum  is  obviously  just  the 
normative  transition  probability.  The  first  term  Is  the  probability 
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that  if  you  start  with  i  wrong  responses  and  i-k  tan  be  eliminated,  j 
wrong  responses  remain.  Thus,  j -k  responses  which  might  have  bet 
eliminated  were  not.  This  term  is  equivalent  to  the  determinate  re 
inforcement  probability  of  transit  from  state  i-k  to  state  j-k. 

Thus 

(70)  tt. .  -  /  t.  .  .  .  nn  . 

'  7  ij  i-k,]-k  lk 

k*o 

Here  the  use  of  the  'extra'  normative  states  is  seen.  If  the  subject 
by  incompletely  eliminating  wrong  responses  is  in  a  state  between  D 
and  N* ,  the  normative  probabilities  for  moves  out  of  these  states  are 
needed.  While  TT("  tCij]  ^  can  wr^tCen  *-n  <--rms  of  N,  A,  and  e 
instead  of  as  it  was  in  (70),  the  terms  do  not  reduce  considerably, 
and  we  feel  the  above  formulation  is  conceptually  clearer. 

The  error  probabilities,  given  the  state,  are  the  same  as  before, 

and  thus  the  learning  curve  is  directly  analag  s. 

N* 

(71)  P(cn>  -  SlTl“"lE  -  £  tt";1  .  x  . 

j-o  j  l 

The  general izat ion  of  the  AE-2  and  AE-3  modeis  is  the  same  as 
that  for  the  one-parameter  elimination  model.  In  (0),  for  tt  sub¬ 
stitute  tt'  or  tt",  for  t  substitute  t'  or  t" ,  and  in  (?1)  make  the 
same  substitutions.  TT'  and  TT"  are  then  the  new  transition  matrices 
for  the  mul. t iresponse  reinforcement  version  of  the  AF.-2  and  AE-1 
models,  respectively. 

Hie  AL-2  model  wit.,  c  set  equal  to  0  (no  elimination  occurs)  is 
an  alternative  extension  of  the  one -element  model.  Here  with  prob¬ 
ability  c  the  subject  acquires,  or  conditions  to,  the  entire  '.roup  of 
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responses  that  -./ere  both  'n  memory  and  in  the  current  reinforcement 
set.  This  generalization  reduces  to  the  normative  model  if  c  is  set 
equa 1  to  1 . 

The  generalization  of  the  elimination -forgetting  model  is  compar¬ 
able  to  that  fcr  the  other  three  elimination  models.  Since  the  incom¬ 
plete  information  affects  only  the  number  of  responses  possible  for 
elimination,  not  the  forgetting  given  the  state  immediately  following 
reinforcement,  only  the  T  matrix,  l.e.,  the  elimination  matrix,  is 
affected.  The  effect  is  precisely  that  of  the  el  imination.  model .  Thus 
if  TT  Is  defined  as  in  (71)  the  formu'ation  of  the  model  is  the  same  as 
for  the  determinate  reinforcement  case,  substituting  TT  for  T 

Conditioning-strength  model.  The  generalization  of  this  model  is 
precisely  parallel  to  the  generalization  of  the  elimination  model.  It 
does  not  reduce  to  the  normative  model,  because  of  the  difference  in 
response  assumptions.  Therefore  as  €  approaches  1,  and  the  transition 
matrix  approaches  the  normative  matrix,  the  conditioning  strength  model 
predicts  learning  faster  than  that  predicted  by  the  normative  theory. 
Needless  to  say,  this  prediction  could  not  hold  in  practice,  and  the 
model  needs  investigation  for  more  intermediate  values  of  e. 
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3.  Paired-Associate  learning  with  a  Continuum  of  Response  Alternatives 

In  the  experimental  paradigms  discussed  so  far,  subjects  select 
their  response  from  one  of  a  finite  (and  usually  small)  set  of  alter¬ 
natives.  Linear  and  stimulus-sampling  models  for  situations  involving 
a  continuum  of  response  alternatives  have  been  proposed  by  Suppes  [63, 
64].  A  brief  description  of  experiments  run  by  Suppes  and  ^rankmann 
[68]  and  by  Suppes,  Rouanet,  Levine,  and  Frankmann  [70]  give  a  feel 
for  the  type  of  experimental  setup  we  shall  now  consider. 

In  these  experiments  subjects  sat  facing  a  large  circular  disk. 
After  the  subject  responded  by  setting  a  pointer  to  a  position  on  the 
circumference  of  the  disk,  he  was  reinforced  by  a  light  that  appeared 
at  some  point  on  the  circumference.  As  the  subject  saw  exactly  where 
the  light  flashed,  i.e.,  what  his  response  'should'  have  been,  rein¬ 
forcement  was  determined.  In  these  studies  reinforcement,  was  also  non- 
conuingent .  The  reinforcement  density  in  the  1961  study  was  triangular 
on  0-2«;  in  the  1964  study  it  was  bimodal,  consisting  of  triangular 
sections  on  0-ir  and  ir-2i  .  By  reinforcement  density  we  mean  the  prob¬ 
ability  density  function  from  which  reinforcement  is  drawn.  For  ex¬ 
ample,  if  f(y)  is  the  reinforcement  density,  the  probability  that  the 

f  b 

reinforcement  will  appear  between  a  and  b  is  i  f(y)dy,  and  this  prob- 

J  a 

ability  is  contingent  on  neither  trial  number  nor  the  subje-'-'e  previous 
response . 

The  experimental  paradigm  just  described  corresponds  more  fully 
to  probability  learning  than  to  PAL  and  will  be  considered  again  later. 
Variations  of  it,  however,  correspond  to  PAL. 
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Ccmplete  information  We  consider  a  list  of  length  L  of  distinct 
stimuli  (trigrams,  for  example).  Each  stimulus  corresponds  to  a  single, 
fixed  region  on  the  circumference  of  the  experimental  disk.  The  sub¬ 
ject  is  shown  the  stimulus,  indicates  his  response  with  his  pointer, 
is  show  the  region  considered  correct,  and  then  is  shown  the  next  stim¬ 
ulus.  His  response  is  considered  correct  if  It  falls  in  the  reinforced 
region;  otherwise,  it  is  incorrect.  We  wish  now  to  derive  a  learning 
curve  for  the  subject. 

Denote  the  center  of  the  correct  region  by  e  and  let  the  correct 
region  extend  a  distance  a  on  either  side  of  e.  The  subject's  response 
is  given  by  a  density  r^(x)  for  trial  r.  If  the  subject  is  known  to 
be  conditioned  to  some  point  z,  then  the  density  for  his  response  is 
a  smearing  density  k(x|z).  The  parameter  z  itself  is  a  random  variable, 
and  we  shall  denote  its  density  on  trial  n  by  g^fz).  The  conditioning 
assumption  we  shall  make  is  that  with  probability  1-0  the  parameter  of 
the  subject's  smearing  distribution  makes  no  change  after  reinforcement, 
and  with  probability  6,  z  is  distributed  by  an  'effective  reinforcement 
density'  f (y) .  Subsequently,  we  shall  consider  two  candidates  for 
f(y).  First,  observe  what  happens  to  the  reinforcement  density  g(x). 
(All  these  matters  are  disc..;**',  in  derail  in  Suppe#  [  1  ^ S9 )  with  .» 
different  interpretation  of  the  effective  reinforcement  density.) 

The  density  -  changer.  :  ioll  c  .v  won  : 

(72)  8n+lU)  "  (1~6)  8n(,)  +  6  f(0  ■ 

If  we  assume  that  g^(z)  i«  uniform  (-  1/2  w),  we  find  from  the  above 
recursion  that 

(71)  g  (z)  -  <i-e )n_1/2«  +  (l-d-e)"'1)  f  (z) . 

n 


-i;2- 


The  probability  of  being  correct  on  trial  n,  p(Sn),  i*»  given  by 


(74)  p(Sn) 


2ir  e+a 

l  f 

0  e-a 


k(x!r)  gn(*)  dx  dz 


Two  plausible  assumptions  concerning  f(y)  are: 


(75)  f^y)  -  5(y-e), 


or 


il/2a  e-a+y<e+a 

(76)  f2(y)  - 

(  0,  elsewhere  . 


If  conditioning  occurs,  f^(")  asserts  that  z  becomes  e;  f^(y)  asserts 
that  z  becomes  uniformly  distributed  over  the  correct  region.  The 
learning  curves  for  f^(y)  and  f ^  (y )  follow:  For 

e+a 

5  )  -  (1-8 )n_1  i  +  (l-d-e)""1]  f 

n  f  l 

-'e-a 


(77)  p(S 


k(x|e)  dx, 


and  for  f2> 


(78)  p(s  )  -  (i-8)n_1  U-a-e)n‘ 

n  * 


e+a  e+a 


/ 


k(x  z)  dx  dz 


e-a  e-a 


For  the  present,  we  shall  derive  no  further  statistics  for  these  models. 

Incomplete  information.  The  experiment  is  organized  so  that  a 
total  of  A  regions  of  fixed  width  2a  are  presented  to  the  subject  each 
time  he  is  reinforced.  One  of  these  regions  is  fixed  with  center  at 
y;  the  others  have  their  centers  uniformly  distributed  on  0-2*  each 
trial.  (Hence,  there  can  be  overlap  among  the  reinforcers.)  A  list 
of  stimuli  is  assumed.  The  subject  starts  with  z  uniformly  distributed 


The  function  *  (*)  is  the  ufrao  de  1 1 1  function. 
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nn  the  region  0-2n.  The  conditioning  assumptions  are:  (i)  if  the 
subject  responds  in  a  reinforced  region,  cond  toning  remains  unchanged; 
and  (if)  if  he  does  not,  with  probability  (1-')  h;s  conditioning  re¬ 
mains  unchanged  .  and  with  probability  ■' .  it  is  spread  uniformlv  over  the 
reinforced  reg  »ns,  Let  us  start  with  some  defin  cions.  The  total 
area  expected  to  be  covered  by  reinforcers  on  any  given  trial,  is  v> 
where 

2~ 


(79)  1-n  -  /  (2-  -  2v)"  dt  , 

J0 

and  hence  , 

(80)  v  -  2n‘  1  -  (2-  -  2v)A\ 

Let  s  denote  tne  event  oi  responding  ui  a  renuorced  region  on  trial  n, 
n 

W  the  event  of  being  wrongly  conditioned  on  trial  n  the  event  of 
being  correctly  cond  tinned  o;.  cr,dl  n  (i.e.,  z  is  In  the  one  'correct1 
region)  Then, 


/♦or  y*o 


/  J  k(x|zj  dx  dz  "  ?  ,  by  dftinition,  and 


y -O  y -o 


(821  P  (s  jW  )  -  1.(2-  -  2 or)  -  v/2~  . 

Equation  (81)  is  an  g^proximatlon ,  becauae  there  ie  ao»e  (•ma1l>  probability 
that  the  subject  will  guess  outside  the  corr  ’  c  region  and  be  reinforced 
oy  one  of  the  distraetors.  Also,  \-e  ,ar  write  the  tranaition  probabilities: 

(33)  P  (C  ;c  )  * P(C  ,  jC  S  )  PtS  )  sf(c  |c  S  ) 

r>+ 1  *  n  a**  1  n  n  a  1  1  n  n 

-  f  ,  (1  .  e)(i  .  os  „  (1  _  a)  a  2o  -  m,  by  definition 
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and 


P(C  j*J  ) 
n '  r. 


+  (i 


Y 

- )  H  - 

2*'  Y 


n,  by  del  ini  Lion. 


The  transition  matrix  between  states  and  error  probability  vector  are, 
therefore  , 


(84)  T 


‘ m  1-ra  ■ 

! 

p  i 

-  n  1 -n  . 

1 

L  yflrr. 

If  S  is  the  vector  that  represents  the  probabilities  of  being  in  the 
n 

2  states  on  trial  n,  then  S  :  (  —  ,  — —  ).  The  learning  curve  is: 

1  IT  It 

(85)  P(x  e  correct  region)  -  S.T 


e 

y/2it  J 


We  shall  complete  this  discussion  of  deriving  the  expression  for 
the  powers  of  T.  The  eigenvalues  ot  7  cun  be  shown  to  be :  A^  =  1 
and  X2  «=  m-n.  Let  Q  be  the  matrix  of  the  eigenvectors  generated  from 
A^  and  X2*  Then, 


1 

m-1 

r-< 

1 

1-ro 

(86)  Q  =• 

n 

and  Q 

n-nrH 

n 

1 

1 

-I 

1 

— 

__ 

It  is  a  theorem  of  matrix  analysis  that 


(87)  Tn  *  Q  /\n  Q  ^  where  A  - 


By  multiplying  and  simplifying  as  much  as  possible,  we  find 
"n  i  (l-m)(m'n)n  l  -  m  (l-m)(m-n)n~ 

n  i -  (m-n)n  1  -  m  +  (rn-n)n 


# 


(88)  T“ 


__1 _ 

-m+1 
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II.  PROBABILITY  LEARNING 

If  an  experiment  is  constructed  so  that  the  only  reward  a  subject 
receives  is  that  of  being  correct,  the  reinforcements  can  be  character¬ 
ized  by  the  amount  of  information  he  receives  concerning  the  correct 
response.  More  specifically,  if  ft  is  the  set  of  response  alternatives 
and  5  is  the  set  of  possible  reinforcements,  then  $  is  the  aet  of  all 
subsets  (power  set)  of  ft.  The  notion  here  is  that  after  responding 
on  a  given  trial  the  subject  is  shown  some  e  5  and  told  that  the 
correct  response  for  that  trial  is  included  in  e^  .  In  the  general 
noncontingent  case  (i.e.,  the  reinforcement  is  not  contingent  on  the 
subject's  response),  each  will  be  shown  with  a  probability  "T^  in¬ 
dependent  of  the  subject's  prior  responses  and  the  trial  number. 

We  now  consider  the  experimental  paradigm  in  which  the  number  of 
responses  in  the  reinforcement  set  is  a  constant,  j  (l^js-ltf,  where  N  is 
the  cardinality  of  ft),  but  no  one  response  is  necessarily  always  pre¬ 
sent.  Thus,  the  paradigm  is  that  of  probability  learning. 

Previous  theories  of  probability  learning  have  dealt  primarily 
with  the  case  j«*l.  We  shall  present  theories  for  arbitrary  j.  The 
first  theory  presented  is  attractive  since  it  implies  a  natural  gen¬ 
eralization  of  the  well-known  probability  matching  theorem.  Unfor¬ 
tunately,  this  theory  is  intuitively  unacceptable  for  extreme  values 
of  the  tt's.  The  second  theory  gives  the  probability  matching  theorem 
for  J"l,  hut  unless  j»l,  or  N-l .  it  is  ma thema t i cal  1 v  untractahle. 

These  two  theories  are  essentially  all  or  none;  we  shall  also  dia- 


i 

! 


cuss  a  third,  linear  theory. 
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1 •  Probability  Learning  Wi thou t  Permanent  Conditioning 
The  assumptions  of  this  theory  are: 

1.  On  every  trial  the  stimulus  element  is  conditioned  to  exactly 
one  of  the  N  responses,  or  it  is  unconditioned.  At  the  outset  it  is 
unconditioned. 


2.  After  reinforcement,  the  stimulus-element  conditioning  remains 
unaltered  with  probability  1-0,  The  stimulus  element  becomes  condi¬ 
tioned  to  any  one  of  the  A  members  of  the  reinforcement  set  with  prob¬ 
ability  0/A. 

3.  If  unconditioned,  the  subject  makes  each  response  with  a  guess¬ 
ing  probability  of  1/N;  if  the  subject  is  conditioned,  he  makes  the 
response  he  is  conditioned  to. 


We  shall  designate  the  set  of  possible  responses  by  A  »  [a^ .a^, . . . ,0^] . 
The  probability  of  response  a^  on  trial  is  denoted  by  The  as¬ 

ymptotic  probability  of  a  ,  i.e.,  lim  p  ,  is  denoted  p  .  By  relabel- 

i  i .  n  l. 

n-»  ™ 

ing,  any  response  can  be  denoted  ' ;  hence,  we  shall  derive  only  . 

W 

As  each  reinforcement  set  has  A  members,  there  are  a  total  of  (^)  ■  N.'/ 

A' (N-A) !  different  reinforcement  sets.  Of  these  reinforcement  sets  a 
will  contain  a^.  We  shall  denote  by  ei»e2,,,,,ek 
those  reinforcement  sets  that  contain  a^;  the  probabilities  that  these 
reinforcement  sets  will  occur  are  . . .  ,ir^. 


number  k 


N-l 

m 

A-l 


Theorem  3 


matching) . 


Assumptions  1  to  3  imply  that 


(89) 
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Proof:  Let  C,  be  the  event  of  being  conditioned  to  a.  on  trial 
n,  and  let  p(£h  be  the  probability  of  this  event.  By  the  theorem  of 
total  probability  and  by  assuming  that  n  is  sufficiently  large,  we  can 
neglect  the  possibility  Oi.  being  unconditioned.  Thus, 


<90’  ’  ^l.n+l 


p(b,  .  c  )p(C  ) 
i  },  ,  nTj.  j  f  n  j  t  n 


The  value  of  p(C^  n)  is  obtained  by  noting  that  one  can  be  in 

state  C,  on  n+1  after  being  in  state  on  n  if  either  the  subject's 

conditioning  is  unaltered  (with  probability  1-0)  or  if  a^  is  in  the 
reinforcement  set  shown,  and  he  becomes  conditioned  to  it  (with  prob¬ 
ability  0/A  tt.)  .  Thus, 


(91)  ^Cl,n+l«Cl.n>  *  <l'0)  +°/A  V 

If  j^l,  the  subject  can  be  in  state  on  rr  only  if  is  in  the  rein¬ 

forcement  set  shown  and  he  becomes  conditioned  to  it.  Thus, 


p(Cl,n+l|Cj,n)  "  0/A  V 


For  large  n,  p(C  )  «•  p(C  )  “  p.;  hence,  (90)  can  be  written 
l , n+i  1 , n  l 


in  the  following  way: 


i  '  +  5/A  |f  "ij  pi  +  ^  (  VA  |? 


P1  "  P1  -  °Pl  +  0M  >  .  ^ 
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S  ince 


1, 


(95) 


Gp^  =*  0/A 


k 


which,  by  cancelling  9,  gives  the  desired  result: 


(96) 


Q  .F.  .1) . 


Some  special  cases  of  the  above  are:  N=2  and  A*1 ;  here 
For  N“3  and  A=»2,  p^  ■  (n^  +  tt^)  / 2;  for  Na6  and  A=*3,  p^  *  (rr^ 

+n10)/3- 


Pi 
+  n 


r 

+  . . 


Let  us  look  at  the  case  N»3  and  A=2  in  a  little  more  detail; 
ei  "  ( a 2 , a3 ^  ,  e2  =  ra(  ,a^}  ,  and  e^  =  [a2>a^}  .  Assume  that  Tr^  °  n2  °  •  5 
and  -  0.  Clearly,  then,  p^  »  .5  and  P2  “  P3  ”  -23.  Notice  that 
since  -  -  0,  u^  is  a Iways  in  the  reinforcement  set.  Data  from  Hie 

experiment  reported  in  the  Appendix  show  that  when  one  response  is  al¬ 
ways  reinforced  (paired-associate  learning',  subjects  learn  to  select 
it  only.  Hence  the  empirical  value  of  p^  is  1.  It  is  obvious,  then, 
that  the  theory  just  presented  will  break  down  if  one  or  more  of  the 
"LS  tends  to  zero;  how  well  it  wu,  do  tor  uonexlreme  values  ol  the 

r* ,  s  remains  to  be  .een. 

1 

1.  Probabi 1 1 ty  Learning  With  Per mane nt  Condi t ioni ng 

Assumptions  1  and  3  of  this  model  are  the  same  ac  for  probability 
learning  without  permanent  conditioning.  Assumption  2  Is  changed  to: 

2’.  (i)  if  the  stimulus  element  is  conditioned  to  one  of  the 

responses  reinforced,  it  remains  so  conditioned;  and 


t 
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(ii)  if  Che  stimulus  element  is  not  conditioned  to  one  of  the 

responses  reinforced,  then  with  probability  1-G  its  conditioning  remains 

unchanged,  and  with  probability  0/A,  it  becomes  conditioned  to  any  one 

of  the  A  members  of  the  reinforcement  set. 

Unfortunately,  this  model  is  less  mathematically  tractable  than 

the  preceding  one  and  asymptotic  response  probabilities  were  obtained 

only  for  the  special  cases  A*1 ,  A*N-1,  and  N**4  with  Aa2.  As  before, 

the  subject's  being  in  state  i  on  trial  n  will  be  denoted  by  C, 

i,n 

Let  us  first  derive  the  asymptotic  response  probabilities  for  A=l, 

The  reinforcement  sets  are  e^  ■  [a^},  »  etc->  and  appear 

with  probabilities  rr  tt  .,,,  tt  Thus, 

1  Z  N 


(97) 


p(Ci.n+1'Ci,n> 


(l-o  + 

1 


since  with  probability  1-9  the  subject's  conditioning  undergoes  no 
change  and  with  probability  n.b  he  is  reinforced  with  a^  and  conditions 
to  it.  If  j^l,  p(C^  n+^iCj  n)  *  By  the  theorem  on  total  prob- 

abil i ty , 


(98) 


((1-9)  + 


V)pi 


"1%  -  Vi 


But  this  is  equivalent  to: 


(99) 


(1-°)P1  + 


N 

5 


Y 


so  we  obtain,  for  A"1 ,  the  probability  matching  result: 


P; 


(100) 
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For  A-N-l  let  us  denote  the  reinforcement  9ets  in  the  following 
way:  e^  ■  (a^:  jj*i).  That  is,  contains  all  the  responses  except 
a^.  Clearly  there  are  a  total  of  N  reinforcement  seta  whose  proba¬ 
bilities  will  be  given  by  At  this  point  it  may  be 

helpful  to  look  at  the  transition  matrix,  from  state  to  the  other 
states.  The  notation  C^e^  means  that  the  subject  was  in  state  and 


received  reinforcing  set  e 


J* 


(101) 


Thuj  we  see  that  p(C  .  C  )  is  equal  to  (1-n  )  +  (l-0)n  .  For  j^i, 

l  f  n  •  J.  l  p  n  i  3 

p(C.  .ilc.  )  *  w.  9/(N-l).  By  the  theorem  on  total  probability,  we 
i  f  n  >  x  j  1 1  n  x 

see  that: 

l/f  \  i 

(102)  -  (l-w1  +  w1-6it1)  pt  +  0/(N-l)  J/.,  Wj-  Vi  |  ‘ 


or 


wlpl  +  W2p2  + 


+  Vi  + 


+  Vn 


"  NVi' 


(103) 
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V 

Aa  this  Is  true  for  all  1, 


(104) 


Since  Pj  +  pj  +  •••  +  +  •••  +  PN  ■  If 


(105) 


Substituting  (104)  into  (105)  we  obtain 


(106) 


which  is  equivalent  to 


(107) 


\t11l  *  ni 


i  r--W  n 

L1  " 


\  3-1  3 


»  J/w, 


-rhe  derivation  of  asymptotic  response  probabilities  for  N«4,  A“2 
is  both  tedious  and  unilluminating;  we  shall  state  only  the  results. 
The  six  reinforcing  events  are  labeled  as  follows:  e^  -  'J'’d2  ' 
e2  -  {a  .a^},  e3  -  {Sj,a^},  e^  -  (a^a^),  •  {a^.a^},  and  eft  -  (a^.a 

The  response  probabilities  are  given  by: 


(108) 


-1 

p  -  B  r 


-152- 


where 


(109)  p  - 


pl 

1 

1 

1 

1 

1 

p2 

r  “ 

0 

and 

B  - 

-J(vw 

V*5 

V’6 

V\ 

P3 

0 

V’s 

-2(tt2+tt;j+it6) 

”l+,5 

V\ 

1 - 

0 

’l+'3 

*2+’6 

*2  (n^+n^+if^) 

V’4 

A  * 

N-l, 

if 

TT^  is 

equal  to  zero,  a^ 

will  appear  in 

every  rein- 

set . 

As  we 

have 

seen 

the  theory  of 

probability  learning  with- 

J  Jb  p£luldU6(\t  CO. id  iticr.i  •'g  falls  to  predict  the  empirical  result  that  in 
this  case  p^  equals  one.  The  model  just  described  does  predict  that  p^-1 
on  the  assumptions  that  rr^»0  and  for  ji*i  it  >  0.  To  see  this,  let  us 
write  out  (107): 


(110) 


"iVVri+r"’^ 


1  YVVi  +  +  Vz-’-ViVi  +  •••  +  V3--*ffN 


Now  all  the  terms  in  the  denominator  but  one  contain  therefore,  they 
vanish.  The  one  that  does  not  contain  must  be  unequal  to  zero  since 
for  jl*i,  >  0.  But  this  term  is  the  same  term  as  the  numerator  so 
that  p^»l. 

3.  A  Generalized  Linear  Model  for  Probability  Learning 

In  Part  II,  Section  2.3,  two  distinct  linear  models  for  ptir«J- 
associate  learning  were  developed.  We  will  applv  the  model  exemplified 
in  th«  matrix  0f  Equation  (59)  of  the  preceding  aection.  The  basic 

assumption  behind  matrix  L^  is  that  the  decrement  in  response  proba¬ 


bility  of  a  response  not  reinforced  on  a  trial  was  to  be  spread  uniformly 
only  among  reinforced  responses.  As  noted  previously,  with  N  response 
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al ternatl vea ,  A  of  which  are  reinforced  on  any  trial,  there  are  (  )  ■  J 

A 

/  N  - '  i 

different  reinforcement  sets,  k  of  which  contain  a^,  where  k  ^  I. 

Let  us  label  the  reinforcement  sets  in  such  a  way  that  the  fir3i  k  con¬ 
tain  a^  then  determine  p^,  the  asymptotic  response  probability  tor  a^. 
The  probabilities  of  the  J  reinforcement  sets  are  given  by  ,  rr^ . , . , 
T"'  ,etuis“’n  ljr 


<1U>  Pl.nrt  •  t(1- 


l-5,pl,n]  ^  "i  +[pl.n  T'  V 


The  first  term  on  the  right-hand  side  represents  given  that  a^ 

was  not  reinforced  on  trial  n  times  the  probability  that  it  was  not 
reinforced;  the  second  term  is  analogous  except  that  it  assumes  was 
reinforced  on  n.  The  part  in  brackets  in  the  second  term  of  the  right- 
hand  side  follows  from  (64). 


I 


We  now  define  two  terms: 


(112) 


^  and  n  -  ^  n 

N-l  7.  i 

i^T+1 


from  which  it  follows  that  (1  -  7.)  •  \  '  n  Her 

?*T 


e  n  la  the  probability 


that  s^  not  ">e  included  In  the  reinforcement  set  and  1-H  is  the  prob¬ 
ability  tnat  it  is  included,  We  can  now  rewrite  (111)  aa 


P,  "  0->)P,  ~  +  i-p  +  >r(l-p  ),  (l-H) 

i  ,  n+T  i  ,  r.  i ,  n  1  ,  n 


*  P,  „  -  3  "  P,  n  +  -  ir  (1-Jl)  -  ( i  -  P ,  _) 

I i n  i i n 


(113) 
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7n  the  limit,  p,  =  p,  ,  =  p,  ;  hence, 
1  1  ,  iw  1  1 ,  n 


(114) 


aIIpi  n  "  >r(1-n)  ^  "pi  • 


From  tiiis  it  follows  that: 


(115) 


r(l  -  n) 
n  +  r(i-n) 


As  a  special  case  of  the  above,  if  A=l,  then  r» 1  and  p  -  1  . 
giving  probability  natciiing. 

This  completes  our  discussion  of  probability  learning  with  finite 
response  sets.  We  have  developed  only  a  sample  of  the  theories  possible 

to  obtain  in  analogy  to  the  theories  of  paired-as3ociote  learning.  It 

« 

would  seem  profitable  to  obtain  some  data  before  continuing  the  theoret¬ 
ical  development  Co.'  tar  but,  so  tar  us  we  know,  the  nniv  relevant  data 
for  A*2  are  from  unpublished  work  of  Michael  Humphreys  and  David  Rumelhart. 
4.  Probabil  1  ty  Learning  With  a  Cont  i  i-.uum  of  Responses 

The  experiment,  discussed  in  Part  II.  Section  4  for  a  response  con- 
; .  ;  9  u  .  exa-.p,  e  •;  pi  '  t>*  i  •  i  .  it-,  learning  vita  a  com  ituium  of  re¬ 

sponse  ind  rf  miiHt  ewnt  possibilities.  The  siexl  paradigm  di  .uiisstd  .i  1 
ha>  a  continuum  of  response* ,  but  discrete  re i n torcemem  . 

Probab  i  1 1  ty  learning  with  1  e  f  t  -  r  t  gh  t  rel  n  fore  er.cn  t .  Consider  a 
taak  in  which  the  subject  is  placed  before  a  straight  bar  (perhaps  2 
feet  long)  with  a  light  bulb  at  either  end.  The  Subject  is  told  that 
when  he  indica.es  a  point  on  the  bar  at  the  beginning  of  each  trial 
one  of  the  lights  will  flash.  Mia  task  is  to  minimize  the  average  dis¬ 
tance  between  the  point  he  selects  and  the  light  flash  on  that  trial 
Cleorlv,  thi'  is  a  taak  with  a  continuum  of  response  a  1 : frna t i ves ;  it 
differs  from  the  probability  learning  to  k>  to  he  described  »ime 
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'I 


there  are  only  two  reinforcing  events.  We  shall  call  this  task  prob¬ 
ability  learning  with  left-right  reinforcement.  Reinforcement  is  deter¬ 
minate  8 ince ,  after  one  light  flashes,  the  subject  knows  he  should  have 
selected  that  extreme  end  of  the  bar. 

We  first  show  that  if  the  subject  believes  the  probability  of  the 
left  light  flushing,  P(L) ,  differs  from  .5,  he  should  choose  one  ex¬ 
treme  or  the  other.  Number  the  leftmost  point  the  subject  can  select 

0  and  the  rightmost  point  1.  Let  r  denote  the  subject's  choice  on 

n 

trial  n.  Let  k  equal  his  loss.  If  the  left  light  flashes  k  11  r  ,  and 

n 

if  t tie  right  light  flashes  k  -  1-r  .  His  expected  loss  is  given  by: 

n  J 

(116)  E(k)  =■  P(L)  r  +  Cl  -  P(L)  ]  (1-r  ). 

n  n 

Differentiating  with  respect  to  r^  we  obtain, 

(U7)  giil .  2  P(L)  .  !. 

n 

Assume  that  P(I.)  >  .5;  then  the  derivative  of  the  subject's  expected 
loss  is  strictly  positive,  that  ij,  E(k)  is  an  increasing  function  of 
r^  so  E(k)  is  minimized  by  choosing  r^  -  0.  Exactly  similar  arguments 
hold  if  p(L)  .5. 

The  strategy  just  analyzed  is  an  optimal  strategy.  Our  belief, 
however,  is  that  the  subject's  behavior  will  be  analogous  to  the  prob¬ 
ability-matching  behavior  exhibited  in  finite  probability  learning 

situations.  That  is,  we  expect  that  r  will  approach  1-tt  where  ■n  is 

n  L  L 

the  noncontingent  probability  that  the  levt  iiz.it  will  flash. 

A  simple  linear  model  gives  this  result.  Let  r  be  given  in  terms 

n+l 

of  r  : 
n 
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(118) 

/  r  *  (9r  +  1-0; 

j  n+1  n 

)  if 

right,  light  flashes 

i  er 

t  n 

if 

left  light  flashes  • 

It  is  then  easv  to  show  that 


(119)  lim  r  ®  1-tt  . 

r>-*  n  L 


The  linear  model  predicts,  of  course,  considerable  variation  in  r  , 

n 

even  after  its  expected  value  reaches  asymptote. 

Let  us  now  turn  to  a  stimulus-sampling  model  that  also  gives  prob¬ 
ability  matching,  but  that  predicts  decreasing  motion  around  l-nT  as 
n  increases.  In  the  stimulus-sampling  model,  the  subject  is  conditioned 
to  one  response  or  any  given  trial.  He  chooses  his  response,  however, 
from  some  distribution  "smeared"  about  the  response  he  is  conditioned 
to.  In  most  stimulus-sampling  models  this  smearing  distribution, 
k(rjp)  where  p  is  a  vector  representing  the  parameters  of  the  distribu¬ 
tion,  maintains  a  constant  shape  in  the  course  of  learning.  In  this 
model  the  shape  of  the  distribution  changes  as  does  the  response  it  is 
smeared  around.  Specifically,  the  model  assumes  that  k  is  a  beta  dis¬ 
tribution  with  parameters  o  and  3-  .  The  expected  value  of  r  is,  then, 

n  n  n 


1 


0 

(120) 


rk(r  a  )dr.  Since  k  is  a  beta,  this  becomes 
1  n  n 


E<r„> 


ot 

n 

a  +  8 
n  n 


The  model  further  assumes  that  o,  =  '  =  c 

ill 

be  estimated  The  conditioning  rule  is: 


where  c 

1 


is  j  parameter  to 


a 


r.+l 


& 


c 

2 


(121) 


n 


it  t he  right  light  flashes 
it  t lit;  left  light  flashes 
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and 

(122) 

9  *= 

n+1 

(3 

/  n 

1 

if  the 

right 

light  flashes 

f  =  4 

n 

c  ^  if  t  tie 

left 

light  flashes 

wr-ero 

c.,  is  the  s 

'Cor.d 

para-no:  or 

OI  tit 

o  i  iieor 

For  n 

large, 

,  4  c,n(l  —  ) 

(123)  lien  E(r  )  - — -,77^ - ~ 

n-  ■»  n  ii.  ’  *-2 


i  -  n. 


which  corresponds  to  probability  matching.  Assuming  that  the  prob¬ 
ability  matching  prediction  is  borne  out,  this  model  can  be  compared 
with  the  linear  model  on  the  basis  of  response  variance  for  n  large. 

Modification  of  subjective  probabilities .  In  estimating  a  prob¬ 
ability  a  subject  may  be  said  tc  be  responding  from  a  continuum  of 
alternatives.  If  he  is  then  reinforced  with  new  information  relevant 
to  the  probability  in  question,  the  'normative'  prediction  is  that  he 
wili  modify  iiis  probability  estimate  in  accord  with  Bayes'  theorem. 

It  is  our  purpose  in  this  subsection  to  look  at  one  type  of  probability 
modification  behavior  from  an  explicitly  learning- theoretic  point  of 
view . 

Let  the  subject  have  some  simple  means  of  responding  on  the  in¬ 
terval  [0,1].  Denote  his  response  on  trial  n  by  p  .  The  experimenter 
places  before  the  subject  a  jar  containing  a  large  number  of  marbles, 
say  1000.  He  tells  the  subject  that  there  are  1000  marbles  in  the  jai 
and  that  the  only  colors  the  marbles  may  be  are  chartruese  (C)  and 
heliotrope  (H) .  The  subject  is  told  that  there  may  be  from  0  to  1000 
of  each  color  of  marble.  Under  these  circumstances  Jamison  and  Kozielecki 
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(37]  showed  that  subjects  tend  to  ha*~e  a  uniform  density  for  p(C)  where 
p(C)  represents  the  subject's  estimate  of  the  fraction  of  chartreuse 

larbles  .  ilence,  it  is  -.'.at  oral  to  expect  that  p  ^  will  equal  .  Tn  the 

experimental  sequence,  the  subject  responds  with  p  ;  the  experimenter 

n 

fishes  a  marble  from  the  jar,  shows  it  to  the  subject,  and  replaces  it; 
the  subject  responds  with  Pn+j^  ■  The  similarity  between  this  model  and 
the  left-right  probability  learning  model  mentioned  previously  is  clear. 
Let  us  use  the  stimulus-sampling  model  developed  for  that  situation 
( 1 23) -(131) .  From  Jamison  and  Kozieleckl's  observations  it  is  natural 
to  assume  that  the  parameter  c  ot  that  mode!  be  equal  to  one.  Results 
of  data  presented  in  Pe  --son  and  Phillips  [51]  indicate  that 
should  be  near  one  and  observations  by  Phillips,  llavs .  and  F.dwa’-ds  [52] 
indicate  that  should  be  less  than  one .  At  any  rate,  after  seeing 
ry  chartruese  and  n  helitrope  marbles,  the  density  for  p  is: 


(124) 


r(Pn> 


8(nCc2+1’  n,, 


c2+^ 


pnC  C2 


(1-P)"HC2 


where  n  •  n  +  n  and  8(•,•)  denotes  the  beta  function  of  those  argu- 

v*  ft 

ments.  The  expectation  of  this  density  is: 


(125) 


VV  1 

E(p  1  =■  • — -  -s—  - 

P  “cVW2 


Asymptotically,  this  model  implies  that  the  subject  will  arrive 
at  the  correct  probability.  It  c  -  1,  the  subject's  behavior  is  nor¬ 
mative  throughout.  Thus  our  learning  model,  if  it  gives  an  adequate 

A 
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account  of  this  type  o i  data,  yields  the  same  results  as  a  Bayesian 
model.  (In  a  sense,  generalizes  the  normative  model .  See  Suppes 

[65]  ft*  an  account  of  the  one-element  model  viewed  as  a  generalization 
oi  Bayusiar  updating.)  What  are  the  implications  of  this? 

If  we  assume  that  the  stimulus -sampling  model  is  also  adequate 
lor  the  left-right  probability  learning  situation,  we  have  a  single 
learning-theoretic  model  that  accounts  for  behavior  that  in  one  case 
is  normative  and  in  the  other  case  is  not.  Bayesian  or  degraded 
Bayesian  models  are  adequate  in  some  cases,  because  they  approach  the 
learning-theoretic  models.  The  implication  here  is  that  our  notion 
uf  optimality  is  very  limited. 


Multipoint  reinforcement.  We  now  consider  a  probability  learning 
paradigm  with  a  continuum  of  responses  analogous  to  that  with  a 
finite  response  set,  but  A  (the  number  of  responses  in  a  reinforcement) 
is  greater  than  1.  There  are  A  points  on  the  circumference  of  the 
circle  reinforced  after  the  subject  has  set  his  pointer.  With  prob¬ 
ability  1-0  the  mode,  z,  of  his  smearing  distribution  (defined  prior  to 
equation  v"')  is  assumed  to  remain  unchanged.  With  pr^o^Hty  0/A,  z 
moves  to  any  one  of  the  points  reinforced.  Thus  the  recursion  on  the 
density  for  z  is  given  by: 


:  f" 

,  A 

\t 


(126)  gn+1  (z)  =*  (l-9)gn(z)  +  G/A  1 1  (z)  +  9/A  t.,(z)  +  ...  +  0/A  t^(z) 
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where  f^(z)  gives  the  density  from  which  the  ith  reinforcement  was 

drawn.  For  large  n,  g  (z)  =*  g  (z)  .  Hence, 

n+1  n 


(127)  gjz) 


8,n(z)(l-6)  +  Q/A  [  { t  (z)  +  f2(z)  +  ■  •  •  +  fA<z) 


1 

A 


In  Suppea  [1959],  the  asymptotic  response  density,  r  (x) ,  is  derived 
from  the  above  and  shown  to  be: 


U23) 


r  (x) 


1 

A 


2~ 


k  (x  j  z) 


f  .  (z)  dz , 


The  interesting  prediction  of  this  theory  is  that  the  same  r  (x) 
ia  obtainable  for  multiple  reinforcement  as  for  single  reinforcement, 
if  the  density  for  the  single  reinforcement  is  the  average  of  the 
densities  for  multiple  reinforcement. 

Let  us  consider  one  omer  probability  learning  task.  The  subject 
is  reinforced  on  each  trial  with  a  region  of  length  centered  at  y 
where  y  is  a  random  variable  with  density  f(y).  The  simplest  assumption 
is  that  if  the  subject  becomes  conditioned,  he  conditions  to  point  y. 

If  this  is  so,  clearly,  r^(x)  must  he  given  by: 


k(x  z)t(z)dz. 


(129) 


r  (x) 
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This  is  somewhat  counterintuitive  since  it  is  independent  of  a.  Per¬ 
haps  a  more  reasonable  conditioning  assumption  would  be  that  z  is  dis¬ 
tributed  uniformly  over  the  reinforced  region  if  conditioning  occurs. 

Let  iii  define  U(z|y,a)  to  equal  l/2a  for  y  -  >  z  4  y  -i  and  0  else¬ 
where.  The  density  for  z  on  trial  n+1 ,  given  that  conditioning  occurred, 
is  denoted  U^z);  it  is  given  by: 

f2n 

(130)  0  ^ ( z )  *  1  u(z | y,  a)  f(y)  dy. 

'0 

The  recursion  for  gn(z)  is,  then, 

(131)  8n+i (z)  3  (l-9)8n(z)  +  6  Ua(z), 

and  the  asymptotic  response  density  is: 

(132)  r^x)  -  (  k(x|  z)Ua(z)dz  . 

J  0 

We  shall  derive  no  further  statistics  for  these  models  at  this  time. 
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III.  CONCLUDING  COUNTS:  MORE  GENERAL  INFORMATION  STRUCTURES 

In  the  experimental  paradigms  discussed  thus  far  the  set  E 
of  possible  reinforcements  can  be  divided  into  two  subsets  for  each 
s timulus - response  pair.  One  subset  contains  reinforcements  that  indi¬ 
cate  the  subject's  response  to  the  stimulus  was  'correct';  the  other 
contains  reinforcements  that  indicate  his  response  was  'incorrect'. 

By  his  design  of  the  experiment,  the  experimenter  chose  a  probability 
distribution  for  each  S-R  pair  over  the  set  of  possibLe  reinforcements; 
this  distribution  generates  a  distribution  on  the  subsets  'correct' 
and  'incorrect'.  If  the  subject  can  choose  a  response  to  each  stimulus 
so  that  he  is  certain  to  receive  a  'correct*  reinforcement,  we  have 
the  case  defined  previously  in  this  paper  as  paired-associate  learning. 
If  the  distribution  on  E  depends  only  on  the  stimulus  and  not  on  trial 
number  or  the  subject's  response,  the  reinforcement  is  noncontingent. 

If  the  distribution  on  E  is  noncontingent,  and  there  is  no  response 
t ha t  will  insure  the  subject  he  is  correct,  we  have  probability  learn- 
i  ni' 

Our  purpose  in  this  concluding  section  is  to  consider  briefly  the 
case  where  the  set  E  has  more  than  two  subsets  that  are  equivalence 
classes  with  respect  to  their  value  to  the  subject.  To  give  a  more 
concrete  idea  of  what  we  have  in  mind,  we  will  first  discuss  the  ex¬ 
periment  by  Keller,  Cole,  Burke,  and  Estes  [AO]  that  illustrates  the 
notion  of  information  via  di f ferent ia 1  reward . 

The  subjects  were  faced  with  a  pa i red- assoc ia te  list  of  25  items. 
There  were  two  response  alternatives  and  5  possible  reinforcements- - 
the  numbers  I,  2,  A,  b,  and  8.  One  of  these  numbers  was  assigned  to 
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each  3-R  pair  as  its  point  value.  The  subject  was  told  that  his  pay 
at  the  end  of  the  session  depended  on  the  number  of  points  he  accu¬ 
mulated.  So,  for  example,  if  the  reward  for  pushing  the  left  button, 
if  XAQ  were  the  stimulus,  was  4,  and  the  reward  for  pushing  the  right 
button  was  1,  the  subject  should  learn  to  push  only  the  left  button. 

The  experiment  was  run  under  two  different  conditions.  In  one  the 
subject  was  told  at  the  end  of  each  trial  the  reward  value  for  both 
of  the  possible  responses;  in  the  other  he  was  told  only  the  reward 
value  for  the  response  he  had  selected.  In  the  latter  case,  since 
there  were  more  than  two  possible  reward  values,  knowing  the  value  of 
one  response  gave  only  partial  information  concerning  the  optimal  re¬ 
sponse.  This  is  an  example  of  information  via  differential  reward. 

Let  us  consider  now  information  via  differential  reward  in  the 
context  of  alternative  types  of  information  a  subject  might  receive. 

A  learning  experiment  may  include:  (i)  a  set  S  of  stimuli,  (ii)  a  set 
R  of  response  alternatives,  (iii)  a  set  E  of  reinforcements,  (iv)  a 
partition  P  of  E  into  sets  of  reinforcements  equivalent  in  value  to 
the  subject,  and  (v)  an  experimenter-determined  function  f  iroi  S*R 
into  P^,  where  P^  is  the  probability  simplex  in  e  dimensioned  >pace 
and  o  is  the  cardinality  of  E.  The  probability  that  each  reinforce¬ 
ment  occurs  is  given  by  f  as  a  function  of  the  stimulus  presented  and 
the  response  selected.  If  o'  is  the  number  of  members  in  P,  f  deter¬ 
mines  a  function  f'  ire.  S*R  .nlo  Pg i .  and  f',  then,  gives  the  prob¬ 
ability  of  each  outcome  value  as  a  function  of  the  stimulus  and  re¬ 
sponse  chosen.  The  subject 's  task  in  a  learning  experiment  is  to  learn 
as  much  as  is  necessary  abou t  f 1  so  that  he  may  make  the  optimal  re¬ 


sponse  to  each  s  timulus . 
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The  subject  learns  about  £'  from  information  provided  him  by  the 
experimenter.  We  may  classify  this  information  into  three  broad  types. 
First,  exogenous  information  is  provided  before  the  experiment  begins. 
The  subject  learns  what  the  responses  are,  what  the  stimuli  are,  whether 
reinforcement  is  contingent,  possible  reward  values,  number  of  trials, 
etc.  Parts  of  this  exogenous  information  might,  of  course,  be  delib¬ 
erate  misinformation. 

The  second  type  consists  of  information  concerning  £*  for  a  fixed 
stimulus.  In  a  typical  paired-associate  experiment  the  subject  re¬ 
ceives  complete  information  concerning  f*  on  each  trial  for  each  stim¬ 
ulus.  In  the  paradigms  considered  in  Part  II,  subjects  are  given 
partial  information  by  having  E  be  the  set  of  subsets  ot  R  (perhaps 
of  fixed  cardinality).  The  subject  is  told  on  each  trial  that  the 
correct  response  was  among  those  shown.  Another  type  of  information 
concerning  the  optimal  response  to  a  given  stimulus  is  information  via 
differential  reward.  Here  the  subject  learns  the  rewards  accruing  to 
the  members  of  the  reinforcement  set.  The  forms  of  information  of  . 
this  type  depend,  then ,  on  the  s true ture  of  the  reinforcement  set. 

The  third  consists  of  information  concerning  f'  for  a  fixed  re¬ 
sponse.  That  is,  does  knowledge  that  response  i  is  optimal  for  stim¬ 
ulus  j  give  any  information  relevant  to  the  optimal  responses  for  other 
stimuli?  This  third  tvpe  of  information  is  obtained  bv  'concept  for¬ 
mation',  '.-.timuius  generalization',  'pattern  recognition',  'recognition 
of  universals',  etc.;  the  term  chosen  depends  on  whether  you  are 


psychologist  engineer,  or  philosopher. 
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Notice  the  symmetry  between  the  second  and  third  types  of  informa¬ 
tion  a  subject  can  be  given.  For  a  particular  stimulus,  the  subject 
may  receive  knowledge  relevant  to  the  optimal  response  that  concerns 
structure  of  the  response  set.  For  a  particular  response,  the  subject 
may  receive  information  about  the  stimuli  for  which  that  response  is 
optimal  by  placing  structure  on  the  stimulus  set.  The  role  of  informa¬ 
tion  via  differential  reward  in  this  context  is  one  way  of  placing 
structure  on  the  reinforcement  set;  earlier  sections  of  this  disser¬ 
tation  considered  other  ways  in  detail. 

For  concept  formation,  there  must  be  some  sort  of  structure  on 
the  stimulus  set.  Roberts  and  Suppes  [53]  and  Jamison  [35]  ad¬ 
vanced  quite  different  models  for  concept  learning  In  which  the  basic 
structure  on  the  stimulus  set  is  of  a  particularly  simple  form,  but 
they  jointly  assume  that  each  stimulus  is  capable  of  being  completely 
described  by  specifying  for  each  of  several  attributes  (e.g.,  color, 
size,  .,.)  the  value  the  stimulus  takes  on  that  attribute.  We  con¬ 
sider  it  an  important  theoretical  task  in  learning  theory  to  describe 
in  detail  other  forms  of  structure  that  can  be  put  on  sets  of  stimuli. 

The  results  in  this  paper  should  be  considered  as  simply  a  pro¬ 
legomenon  to  detailed  analysis  of  information  structures  in  learning 
theory.  Our  results  have  been  limited  to  rather  special  types  of  in¬ 
formation  structures  placed  on  reinforcement  sets.  More  general  struc¬ 
tures  need  to  be  considered  and,  more  important,  information  structures 
on  stimulus  sets — concept  learning — must  be  brought  within  the  scope 
of  the  analysis. 

Part  Three/One  of  this  dissertation. 
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Section  Three 
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Section  Four 

EMPIRICAL  STUDIES 

In  this  final  section  I  report  on  several  empirical  studies  of 
individual  choice  behavior.  Simon  [15,  p.  2]  has  observed  that  this 
is  an  area  of  study  that  has  little  interested  economists:  "Economists 
have  been  relatively  uninterested  in  descriptive  microeconomics — under¬ 
standing  the  behavior  of  individual  economic  agents — except  as  this  has 
been  necessary  to  provide  a  foundation  for  macroeconomics.  The  norma¬ 
tive  economist  'obviously'  doesn't  need  a  theory  of  human  behavior: 
he  wants  to  know  how  people  ought  to  behave,  not  how  thev  do  behave". 
While  Simon's  comment  does  seem  generally  valid,  empirically  oriented 
papers  concerning  individual  choice  behavior  do  occasionally  appear  in 
the  economics  literature.  Some  of  these  studies — for  example  the  duo¬ 
poly  studies  of  Suppes  and  Carlsmith  [16]  and  Friedman  [5] — are  attempts 
to  represent  organizational  behavior  bv  that  of  individuals.  The  rest 
of  these  studies  are  genuine  attempts  to  study  individual  choice  be¬ 
havior,  though  admittedly  under  somewhat  contrived  circumstances.  It 
is  this  last  type  of  study  that  I  shall  report  on  here;  the  next  three 
parts  of  this  dissertation  are  empirical  studies  related  to  the  theo¬ 
retical  developments  of  Section  Three. 

Part  Four/One  reports  on  an  attempt  to  empirically  measure  the 
structure  of  subjects'  beliefs  under  conditions  of  total  unceriuirty — 
where  they  have  no  information  concerning  the  relevant  probabilities. 
This  work  was  done  in  collaboration  with  l)r.  doze!  Kozieleeki  of  the 
University  of  Warsaw  and  has  been  previously  published — Jui  ison  and 
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Part  Four/Two  is  concerned  with  individual  learning  or  adaptive 
behavior  when  his  reinforcements  carry  only  partis1  inf ornate  on  c.  .. 
cerning  the  optimal  policy.  This  work  is  closely  related  to  the  theo 
retical  developments  of  Part  Three/Two  and  was  done  in  collaboration 
with  Mr.  Richard  Freund,  Prof.  Patrick  Suppes ,  and,  primarily,  Miss 
Deborah  Lhamon.  J.t  will  be  published  as  a  part  of  Jamison,  Lhamon, 
and  Suppes  [8], 

Part  Four/Three  reports  on  an  unpublished  study  of  individual  in 
formation  seeking  behavior  done  in  collaboration  with  Miss  Amy  Hersh. 
The  results  are  quite  erratic.  While  this  may  be  an  artifact  of  our 
particular  experimental  design,  I  am  inclined  to  think  otherwise.  In 
formal  experimentation  earlier  by  Mr.  Michael  Humphreys  and  me  using 
computer  control  of  subject  stimulus  resulted  in  similar  erratic  be¬ 
havior. 
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Part  Four/One 

SUBJECTIVE  PROBABILITIES  UNDER  TOTAL  UNCERTAINTY 

I .  INTRODUCTION 

Humans  must  frequently  choose  among  several  courses  of  action  under 
circumstances  such  that  the  outcome  of  their  choice  depends  on  an  un¬ 
known  "state  of  nature".  Let  us  denote  the  set  of  possible  states  of 
nature  by  ft  and  consider  ft  to  have  m  members  that  are  mutually 

exclusive  and  collectively  exhaustive — w, ,  w„ ,  . . .  ,  w  .  The  vector 

I  L  m 

E  =  (E. ,  Eof  ...  ,  E  )  is  a  probability  distribution  over  ft  if  and 
1  l.  m 

only  if  ”  E.  =  1  and  Ej  -  0  for  i  =  1,  ...  ,  m.  f  corresponds 
i»l  1  1  1 

to  the  probability  that  will  occur. 

Edwards  [31,  Luce  and  Suppes  [11],  and  others,  dichotomize  exper¬ 
imental  situations  involving  choice  behavior  in  the  following  wav.  If 
the  decision-maker's  choice  determines  the  outcome  with  probability  1 
(i.e.,  one  of  the  E^'s  is  equal  to  1),  then  the  experimental  situa¬ 
tion  is  one  with  certain  outcomes;  otherwise,  the  outcome  is  uncertain . 
If  the  subject  knows  the  probability  distribution  over  the  outcomes, 
i.e.,  if  he  knows  E,  his  choice  is  risky ;  if  he  only  has  "partial 
knowledge"  or  "no  knowledge"  of  f  his  choice  is  partially  or  totally 
uncertain .  We  shall  use  "total  uncertainty"  in  this  last  wav;  our 
purpose  is  to  examine  the  structure  of  a  subject's  beliefs  when  he  has 
no  knowledge  of  E,  that  is,  when  the  S  is  totally  uncertain. 

Jamison  [6]  has  proposed  a  definition  of  total  uncertainty  that  is  an 
extension  of  the  Laplacian  principle  of  insufficient  reason.  This 
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definition  and  some  of  its  implications  will  be  described  briefly  here 

as  theoretical  background  for  our  experimental  results. 

Consider  the  set  of  all  possible  probability  distributions  over 

Q,  that  is,  the  set  of  all  vectors  Let  us  denote  this  set  by  2 

and  describe  the  decision-maker  s  knowledge  of  f  by  a  density 

f(f.,  f  ,  ...  ,  r  )  =  f(f)  defined  on  2  •  If  f(?)  Is  an  impulse 
l  l  m 

(6  function)  at  \  =  (1,  0,  ...  ,0)  or  f  =  (0,  1,  0,  ...  ,  0),  or 
— ► 

...,  f  =  (0,  0,  ...  ,  1),  then  decision-making  is  under  certainty.  If 

f(r)  is  an  'mpulse  elsewhere  in  £,  ,  the  decision-making  is  risky.  If 

f  (r)  is  a  constant,  the  decision-maker  is,  by  definition,  totally  un- 
-► 

certain  of  r .  The  intuitive  motivation  for  this  definition  is  that 
if  f(f)  is  a  constant,  no  probability  distributions  over  Q  are  more 

likely  than  any  others.  Partial  uncertainty  ocl  irs  when  f(f,)  is 

* 

neither  an  impulse  nor  a  constant. 

If  K  is  the  constant  value  of  f(f)  under  total  uncertainty, 


then : 


Jf,f 


Kdr  df  , .  .  .df,  -  1. 
m  ra-1  1 


(1) 


Evaluating  this  definite  integral  enables  us  to  find  K,  which  turns 
out  to  be  (m  -  1)!  /m.  The  probability  that  \  is  greater  than 

some  specific  value,  ,;av  C,  is  given  bv: 

,]  /*  1  ~  ,  /.l-fi.-F  - 


prob  ( f. 


m  Kdf.  d \  .  .  df 

tn-2  m-3  1 


=  (1  -  (’) 


m- 1 


(2) 


v 

Luce  and  Raifta  (10J  review  normative  them ies  of  decision-making 
under  total  uncertainty.  Extensions  of  these  other  theories  may  be  found 
in  Atkinson,  Church,  &  Harris  (1].  Savage  (13)  presents  a  number  of  ob¬ 
jections  to  the  probability  of  probabilities  approach  used  here.  These 
alternatives  and  objections  are  discussed  in  Jamison  [6]. 
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One  minus  prob(£^  >  C)  is  simply  the  probability  that  £  _<  C  or  the 

marginal  cumulative  for  ,  which  we  shall  denote  by  F^(C).  By 

symmetry  F, (C)  =  F„(C)  =  ...  =  F, (C)  =  ...  F  (C) ;  thus  we  have: 

1  Z.  i  m 

F^C)  =  1  -  (1  -  C)®"1.  (3) 

Fig.  1  shows  F^(C)  for  several  values  of  m. 


Insert  Fig.  1  about  here 


The  derivative  of  the  marginal  cumulative  is  the  marginal  density, 
which  we  shall  denote  f^CC): 


dF.  (C) 

f  (C) - i—  =  (m  -  1)  (1  -  C)  . 

l  dc 


(4) 


Fig.  2  shows  f^(C)  for  several  values  of  m. 


Insert  Fig.  2  about  here 


The  purpose  of  our  experiment  was  to  determine  if  the  normative 
model  just  described  for  belief  under  total  uncertainty  approximates 
the  actual  structure  of  Ss  beliefs.  To  achieve  this  purpose  we 
placed  Ss  in  a  situation  of  total  uncertainty  and  then  empirically 
determined  the  cumulative  F^(C)  for  a  number  of  values  of  m. 
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II.  METHOD 

Subjects 

The  Ss  were  30  students  from  Stanford  University  fulfilling 
course  requirements  for  introductory  psychology.  Each  participated  in 
one  experimental  session  of  approximately  30  minutes  duration.  Ss 
were  run  individually. 

Experimental  design  and  procedure 

At  the  onset  of  the  experiment  the  Ss  were  told  that  the  ex¬ 
perimenter  wished  to  examine  his  beliefs  concerning  the  outcome  of  a 
hypothetical  scientific  experiment  about  which  the  Ss  would  be  given 
very  little  information.  A  particle  measuring  device  would  be  placed 
into  an  environment  in  which  there  were  m  distinct  types  of  particles. 
The  Ss  were  told  that  the  particle  measuring  device  counted  the  num¬ 
ber  of  each  type  of  particle  striking  it  in  any  given  time  interval 
and  that  it  was  left  in  the  environment  until  a  total  of  1000  particles 
of  the  m  types  had  been  detected.  A  copy  of  the  Instructions  is  in¬ 
cluded  as  an  Appendix  to  Part  Four /One. 

The  experiment  consisted  of  three  series  run  with  10  subjects 
each;  in  Series  I  m  =  2,  in  Series  II  m  =  4  and  in  Series  III 
m  =  8.  For  m  =  2,  the  particles  were  named  u  and  e:  for  m  =  4 
they  were  named  w,  e ,  5,  and  and  for  m  =  8  they  were  named  w, 

e,  6,  <Jj,  5*  x»  and  P.  The  experimenter  asked  the  Ss  a  list  of 
questions  of  the  following  form:  "What  do  you  think  the  probability 
is  that  the  particle  measure  device  counted  less  than  500  c-particles 
among  the  1000  total"?  The  Ss  were  asked  to  write  their  responses 
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1 

i 

F 


) 


C 


as  a  two-digit  decimal  cn  a  3"  x  5'1  card,  then  to  turn  the  card  over. 
Each  S  was  given  all  the  time  he  wished  to  answer.  With  m  -  1  and 
m  =  4 ,  Ss  were,  asked  for  each  particle  what  lie  thought  the  prebr.bi  1 1  tv 
was  that  less  than  25,  1^0,  200,  350,  500,  650,  800,  900,  and  11 7  5  of 
that  type  of  particle  would  be  among  the  1000  counted.  The  question 
o>*der  was  random.  For  m  =  8  the  350,  650,  and  97'-  questions  were 
deleted.  After  the  experiment  8s  were  is led  questions  concerning 
their  method  of  answering. 


III.  RESULTS 

.’he  results  were  a  number  of  discrete  values  of  F.  (C)  for  each 
particle  and  for  each  subi*'*'  For  each  pai  t  icl‘  we  pooled  the  results 
of  the  10  subjects  who  were  tested  for  each  value  of  m.  We  then  did 
a  standard  analysis  of  variance  test  to  ascertain  whether  anv  signifi¬ 
cant  differences  existed  in  Ss’  responses  for  the  different  particles. 
As  Table  1  shows,  there  were  no  significant  differences  among  particles 
at  the  .05  level. 

Table  1  -  Analysis  of  Variance  on  Differences  Among  Particles 


Series  df  F  Significance  Level 


m  =  2 

1/162 

.  10 

P 

•>  .05 

m  3  4 

3/324 

.  35 

P 

'•  .05 

m  =  8 

7/432 

1.03 

P 

>  .05 

What  Table 

t  indicates  is  that 

Ss  accented 

Lap  1  ace ' 

s  principle 

ot  Insufficient 

reason;  they  showed 

no  preference 

for  any 

part  leu  1 ar 

particles.  The 

Ss'  answers  t  '  quest  Lons  after 

experimental  ion 
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c or. firmed  this  result.  Since  Ss  accepted  the  principle  of  insuffi¬ 
cient  reason,  results  were  also  pooled  across  particles.  Figs .  3a,  3b, 
and  3c  show  the  normative  cumulat ives  F  (f)  as  well  as  our  data  points 
pooled  across  Ss  and  particles  for  each  of  the  three  different  values 
of  tr.  The  median  responses  shown  in  the  "Igures  correspond  closely 
to  the  means. 


Insert  Tigs.  3a,  3b,  and  3c  about  here 


Fig.  3a  learly  indicates  that  for  m  =  2  the  normative  model  fits 
the  data  very  well,  whereas  for  m  a  4  and  m  =  8  there  is  some  re¬ 
lation  between  the  ■  rrmative  model  and  the  data  but  not  a  fit. 

The  variance  analysis  of  the  data  that  is  displayed  in  Table  2 
indicates  that  when  m  =  2  there  is  no  significant  difference  between 
the  normative  curve  and  the  data  at  the  .05  le^el.  For  m  -  4  and 
m  ~  8  the  difference  between  the  normative  curve  and  the  data  is 
significant  at  the  .001  . evel. 


Table  2  -  Analvsis  oi  Variance  on  Differences 
between  Normative  Models  and  Data 


Series 

-if 

F 

Significance  Level 

m  *  2 

!  /  1  -  . 

1.36 

p  >  .05 

m  =  4 

1  /  ]  f>2 

100. 33 

o  <  .001 

m  «  8 

1/108 

229.52 

p  <  .001 

Since  the  normative  curves  fit  the  data  so  poorlv  when  re  *  4 
and  m  ~  8,  we  decided  to  use  a  one-parameter  curve  of  the  same  form 
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a::d  descriptive  cunralatives  for 


as  Che  normative  model  and  fit  it  to  the  data  by  least  squares  tech¬ 


niques.  That  is,  we  wished  to  describe  the  data  by  a  curve  of  the 
following  nature: 

it  1 

Fi  (C)  -  1  -  (1  -  C)m  \  (5) 

it 

The  *  superscript  indicates  that  w  (C)  and  m*  are  descriptive 
rather  than  normative.  The  least  squares  estimate  of  m*  is  that 
value  of  m*  which  minimizes  the  A  given  in  equation  (6). 

A  ,  r  I  r  m*-l  1  I  2 

.1-1  j  [  1  ~  (1  -  C  )  J  -  P  ,obs  j  ,  (6) 

where  C,  *  25/100,  C  *  100/1000,  etc.,  and  P.  ,  is  the  mean 
1  2  j>obs 

probability  estimate  of  the  Ss.  Table  3  shows  the  least  squares 
estimates  of  m*  computed  numerically  on  Stanford's  IBM  7090. 


Table  3  -  Least  Squares  Estimates  of  m* 


Series 

m* 

A 

m  =  2 

1.98 

.00 

m  =»  4 

2.63 

.04 

m  “  8 

4.05 

3.07 

Figs.  2b  and  2c 

show 

* 

F^  (C)  based  on 

the  values  of  m*  given 

in  Table  3. 

Our  data  indicate 

that 

Ss'  beliefs  are 

quite  close  to  the  norm  - 

ative  model  for  m  ■  2, 

scarcely  a  surprising 

result.  For  m  >  2 

Ss'  beliefs  shift  towara  the  normative  model,  but  not  sufficiently  far. 
The  reason  for  this  is  suggested  in  Figs.  4a,  4b,  and  4c  where  f^(C) 
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and  f  (C)  are  plotted.  ( f ±  (C)  Is  the  descriptive  density  based 

on  the  value  of  m*  given  in  Table  3  inserted  into  equation  (4).) 


Insert  Figures  4a,  4b,  and  4c  about  here 


Fig.  4  shows  that  Ss  underestimate  probability  density  when  the 
density  is  relatively  high  and  overestimate  the  density  when  hhe 
density  is  relatively  low.  When  the  density  is  cc-stant  (m  =  2), 
they  neither  underestimate  nor  overestimate  it.  This  is  a  general¬ 
ization  to  situations  involving  total  uncertainty  oF  the  well-known 
work  of  Preston  and  Baratta  [1948]  and  others  who  have  shown  that  Ss 
tend  to  underestimate  high  probabilities  and  overestimate  low  ones. 

IV.  DISCUSSION 

Our  findings  corroborate  the  results  of  Cohen  and  Hansel  [2]  that 
Ss  tend  to  apply  the  principle  of  insufficient  reason  if  they  are 
given  no  information.  In  addition,  the  phenomenon  of  underestimating 
high  probabilities  and  overestimating  low  is  shown  to  have  a  direct 
analog  in  situations  involving  probability  densities.  Here  Ss  und 
estimate  regions  of  high  density  and  overestimate  regions  of  low 
density. 

Our  results  have  an  Important  bearing  on  the  question  of  the  con¬ 
sistency  of  Ss'  beliefs.  An  individual's  beliefs  (subjective  prob¬ 
ability  estimates)  are  said  to  be  incoherent  If  an  alert  bookmaker  can 
arrange  a  set  of  bets  based  on  the  person's  probabilities  such  that 
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the  person  can  win  in  no  eventuality.  When  the  probabilities  are  well 
known  (i.e.,  when  f(0  is  an  impulse  at  some  particular  F)  a 
necessary  and  sufficient  condition  for  coherence  is  that  the  sum  of 
the  probabilities  of  mutually  exclusive  and  collectively  exhaustive 
events  be  unity  (see  Shimony  [14]).  Analogously,  a  necessary  (but 
not  sufficent)  condition  for  coherence  when  probabilities  are  not  well 
known  is  that  the  sum  of  the  expectations  of  the  probabilities  be 
unity.  That  is,  the  R  defined  below  must  equal  one. 

m  f  1 


C  f  (C)dC. 


(7) 


i=l  J  o 


Since  all  the  f^s  are  equal  (from  the  insignificance  of  the  differ¬ 
ences  among  particles), 


R  =  m 


1 

C(m*  -  1)(1  -  C)mA"2dC  =  2_. 


(8) 


J  o 

Thus  R  =  1  only  when  m*  =  m.  It  is  clear  from  Table  3  that  when 
m  =  4  and  m  =  8,  the  Ss  in  our  experiment  had  beliefs  that  were 
strongly  incoherent. 

Our  study  is  an  examination  of  the  static  structure  of  a  person's 
beliefs  when  he  is  in  a  situatin'1  of  total  uncertainty.  The  natural 
extension  of  this  work  is  to  examine  the  kinematics  of  belief  change 
when  the  S  is  given  information  relevant  to  the  situation.  Work  on 
the  kinematics  of  belief  change  when  probabilities  are  well  know  is 
reported  in  a  number  of  papers  in  a  volume  edited  bv  Edwards  [4]. 
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Appendix  to  Part  Four /One 
Instruction  to  Subjects 

The  instru'  tioi .  j  that  were  read  to  the  Ss  when  m  =  8  are  given 
below.  The  instructions  for  m  =  2  and  m  =■  4  are  the  same  except 
for  obvious  modifications. 

********* 

We  are  running  an  experiment  to  examine  the  nature  of  a  person's 
intuitir  s  concerning  situations  where  he  has  little  or  no  concrete 
evidence  tc  guide  him.  You  will  be  asked  to  estimate  the  likelihood 
of  certain  propositions  concerning  a  hypothetical  scientific  experiment. 
While  there  are  no  absolutely  "right"  or  "wrong"  answer;,  some  answers 
are  bette>-  than  others  Your  response  will  be  evaluated  against  a 
hypothetical  ideal  subject. 

bet  me  now  describe  the  hypothetical  scientific  situation  about 
which  we  wish  to  examine  your  beliefs.  A  particle  measuring  device 
is  placed  into  an  environment  where  there  are  8  distinct  types  ot 
particles  which  we  shall  designate  by  letters  of  the  Greek  alphabet — 
w,  c ,  1 11,  ,  f, ,  f,  x.  O.  'That  the  particle  measuring  device  does  is 
count  the  number  of  each  type  of  particle  that  hits  it  in  a  given  time 
interval.  We  leave  the  counter  in  the  environment  until  it  has  been 
struck  by  a  totaL  of  1^00  particles  of  the  8  types.  ho  von  remember 
what  the  >  types  were?  Prior  to  th"  experiment  you  are  assumed  to  have 
absolutely  no  knowledge  about  the  relative  numbers  of  tiu>  8  types  of 


particles  except  that  some  of  each  may  exist  and  that  no  other  type 
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of  particle  is  in  the  environment.  Given  this  scant  information,  a  d 
nothing  else,  we  want  to  examine  your  intuitions  concerning  how  many 
of  each  type  of  particle  will  be  included  in  the  1090  measured  by  the 
detector. 

The  questions  we  ask  you  concerning  your  beliefs  will  be  of  the 
following  form:  What  do  you  think  the  probability  is  that  there  are 
less  than  some  specific  number  of,  say,  e-particles  among  the  1000 
counted ?  This  statement  would  be  true,  of  course,  if  there  were  0, 
l,  2,  3,  ...  ,  or  any  number  up  tc  th  t  number  of  e-particles  among 
those  courted  but  It  would  not  be  tr  ;  if  there  were  more  than  that 
many  e-particles.  What  you  are  being  sked  is  how  likely  is  it  that 
there  are  less  than  that  number  of  e-particles?  If  you  believed  that 
there  were  certainly  less  than  that  number  of  e-particles,  vou  woulc 
tell  us  that  the  probability  of  there  being  less  than  that  number  is 

.  If,  on  the  other  hand,  you  believed  that  there  were  certain 'v 

more  than  that  number  of  e-particles,  you  would  tell  us  that  the 
probability  that  there  is  less  than  that  number  is  .....  If  you 
believe  that  it  is  equally  likely  that  there  are  more  than  that  numoer 
as  less,  you  would  say  the  probability  is  .5.  You  can  give  us  any 
probability  between  zero  and  one. 

Perhaps  a  more  concrete  example  vill  help  make  things  clear. 
Consider  an  ordinary  die  such  as  this  one.  What  do  v  u  think  the 
probability  is  that  if  T  roll  this  die  a  number  less  than  2  will  be  on 
the  upturned  face?  What  do  you  tb’nk  the  probability  is  ^ f  less  than 
5?  Clearly,  the  probability  of  less  than  2  must  be  smaller  than  the 
probability  of  less  than  b.  Well,  you  see,  this  is  exactly  the  same 
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type  of  question  that  we  shall,  be  asking  concerning  particles  counted 
by  our  counter.  The  only  difference  is  that  with  a  die  you  already 
have  a  good  idea  of  the  probability  asked  for,  whereas  in  this  experi¬ 
ment  we  are  asking  for  vom  i.icuitions  concerning  unknown  probabilities. 

Let  me  now  ask  you  a  few  sample  questions  before  we  begin.  First, 
what  do  you  think  the  probability  is  that  there  are  less  than  1001 
^-particles  among  those  counted?  [Explain  if  answer  is  wrong.]  What 
do  you  think  the  probability  is  that  there  are  less  than  950  w-particles 
among  the  1000  counted?  Less  than  75  e?  Remembering,  again,  that 
there  are  8  types  of  particles,  what  do  you  think  the  probability  is 
of  less  than  500  e-particles?  Less  than  950  5?  [No  feedback  is  given 
last  4  questions . ] 

In  front  of  you  is  a  stack  of  3"  x  5"  cards  that  you  will  write 
your  eplies  on.  ould  you  write  your  replies  as  a  two-digit  decimal 
. . .  like  so . 

Before  we  begin,  please  feel  free  to  ask  any  questions  you  might 


have . 


Part  Four/ Two 


AN  EXPERIMENT  ON  PAL  WITH  INCOMPLETE  INFORMATION 

I.  INTRODUCTION 

In  Part  Three/Two  several  models  were  discussed  for  the  experi¬ 
mental  paradigm  of  PAL  with  noncontingent,  incomplete-information  re¬ 
inforcement.  In  the  summer  of  1967  such  an  experiment  was  performed 
at  Stanford  University  with  several  different  pairings  of  N  and  A,  the 
response-  and  reinforcement-set  cardinalities.  This  summarizes  the 
results  of  that  experiment. 


II.  METHOD 

Ten  subjects  from  the  undergraduate  and  recent  graduate  community 
at  Stanford  participated  in  the  experiment  for  roughly  an  hour  a  day 
for  10  days  within  a  period  cf  two  weeks.  An  on-l;ne  PDF- 1  computer 
controlled  all  displays  and  data  recording.  The  experimental  equipment, 
a  cathode-rav  tube  (CRT)  with  an  electric  typewriter  keyboard  placed 
directly  below  it,  was  housed  in  .  sound-proof  booth.  The  stimuli  for 
any  problem  were  the  first  N  figures  represented  by  the  tirst  N  keys  ot 
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randomizcd  order  of  conditions  for  each  of  the  three  cycles  for  the 
day.  As  shown  later,  the  set  order  of  conditions  started  with  the 
easiest  condition  and  ended  with  the  hardest,  as  determined  by  the  nor¬ 
mative  expected  number  of  total  errors.  At  any  one  time  a  subject  wor¬ 
ked  on  a  solution  to  two  problems,  one  with  the  displays  on  the  top 
half  of  the  CRT,  the  other  with  the  displays  on  the  bottom  naif.  For 
half  the  subjects  the  top  problem  involved  only  the  digits  and  the  bot¬ 
tom  problem  only  the  letters.  For  the  other  subjects  the  top  problem 
was  letters  and  the  bottom  problem  was  digits.  Trials  on  the  two  pro¬ 
blems  alternated.  Both  problems  were  on  the  sat1  (N’:A)  condition,  and 
both  had  to  be  solved  to  a  criterion  cf  4  successive  trials  on  the  same 
problem,  not  successive  trials  i a  the  actual  experiment. 

On  each  trial  the  subji  ‘  saw  the  display  "Respond  From:"  followed 
by  the  N  possible  responses.  He  pressed  a  key  correspond i ng  to  the  one 
he  thought  cot  reel,  and  this  response  was  display  •!  on  the  CRT  below 
the  response  set.  Ttu  i\  edback  set  t.f  A  responses,  •'  r.c  luei  .ig  t  he  sin¬ 
gle  correct  response,  thin  was  displayed  telew  tin  subject  ' response  . 
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TIT.  RESULTS 

The  data  from  all  experimental  groups  for  days  1  through  10  are 
considered  as  a  whole  since  none  of  the  manipulations  other  than  the 
(N;A)  pairings  showed  consi  ' °nt  differences.  Table  3  shows  the  mean 
total  errors  for  each  condition.  The  normative  expec  cd  total  errors, 
determined  by  analytic  methods  of  a  derivation,  or,  in  the  more  dif¬ 
ficult  cases,  computer-run  Monte  Carlos,  are  also  shown. 


TABLE  3 

Predicted  and  Observed  Total  Errors 


N:A 

Observed  mean 
total  errors 

Normative  expected 
total  errors 

2:1 

0.51 

0.50 

6: 1 

0.78 

0.83 

10:1 

0.94 

0.90 

10:3 

1.91 

1.84 

6:3 

2.26 

2.13 

10:5 

2.35 

2.95 

10:7 

6.56 

5.39 

6:5 

7.58 

7.25 

10:9 

18.50 

17.35 

The  conditions  with  A  =  1  were  essentially  cases  of  one-trial  learning. 
Errors  of  chance  happened  on  the  first  trial,  and  or  «"cceeding  trials 
the  error  frequency  was  less  than  .015.  Thu3  the  subjects  performed 
essentially  normatively  on  these  2-item  list  straightforward  PAL  tasks. 
The  learning  curves  for  the  remaining  six  conditions  with  A  >  1  are 
plotted  in  Figures  6  through  11,  along  with  the  normative  learning 
curves.  The  normative  error  probabilities  were  not  determined  beyond 
the  twentieth  trial.  The  normative  and  observed  learning  curves  are 


Fig. 6  — Normative  and  observed  learning  curves  for  N*10,  A  =3 


1.00 


Fig  7— Normative  _nd  observed  learning  curves  for  N  -6,  A ’3 


Probability  of  an  error  jn  trial  n  Probability  of  an  error  on  trial 


Probability  of  an  er 
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very  close  01  some  interest  is  the  cross  of  ormat  i  ve  ;nui  observe’ 
curves  in  the  (6:5)  comK  t  ;  ..n  (Figure  10).  Tuis  better  than  normative 
performance  is  most  iikely  due  to  the  ability  of  subjects  to  use  t  lie 
4-correct -response  criterion  to  solve  one  problem  once  the  other  bud 
been  solved.  This  criterion  use  was  not  built  into  the  normative  model. 

The  study  latencies  are  plotted  in  Figures  12-13.  We  do  no'  know 
how  to  discuss  these  latencies  meaningfully  in  a  quantitative  manner, 
but  present  them  to  call  attention  to  a  qualitative  peculiarity.  In 
the  (10:9)  and  (6:5)  conditions  (Figure  13)  a  ".irked  rise  follow’d  by 
a  decline  occurred  in  the  study  latency  for  several  trials.  In  these 
conditions  when  the  feedback  set  was  close  to  the  response  set  in  size, 
subjects  frequently  said  that  they  watched  for  the  nonrein  forced  respon¬ 
ses.  The  changes  observed  in  the  study  ’  tencies  could  result  from  su-1- 
a  practice,  with  the  rise  due  to  the  increasing  number  of  responses 
known  to  be  incorrect,  followed  by  the  switchover,  and  then  the  decline 
in  latency  with  the  decreasing  number  of  r*  spouses  considered  possibly 
correct.  Tt  should  he  noted  that  by  using  such  a  method  to  intersect 
first  complements  of  reinforcement  sets,  and  then  the  sets  themselves, 
a  subject  needed  ,a  most  '■>  items  in  memory  per  problem.  With  two  prob¬ 
lems  at  once,  he  needed  at  most  10,  but  since  the  feedba  k  sets  on  each 
problem  were  independent,  the  likelihood  of  'noth  problems  having  maximum 
space  at  once  w«v«  low.  Tims  lor  the  most  part  all  relevant  information 
could  le  stored  in  fewer  items  than  the  7  or  8  general  Iv  estimated  as 
maximum  for  short-term  memory. 
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T.-trL  Four /Throe 

A  MICROSTUDY  OF  HUMAN  IN  FORMAT  ION-SEEKING  BEHAV  f  OR 

T .  INTRODUCTION 

Learning  may  ba  considered  to  be  the  utilization  of  information 
in  order  to  change  one's  beliefs  concerning  the  optimality  of  each  one 
of  a  set  of  possible  responses.  This  information  may  be  of  a  partic¬ 
ularly  simple  sort;  in  paired-associate  learning,  for  example,  after 
each  trial  when  there  is  a  correct1'  a  procedure  the  subject  is  given 
complete  information  concerning  the  correctness  of  each  response.  In 
a  recent  paper  by  Jamison,  I.hamo,.  and  Supnes  [SI  a  number  of  paired- 
associate  learning  situations  in  which  much  richer  information  struc¬ 
tures  could  be  analyzed  were  modeled  and  discussed.  In  the  concluding 
section  of  that  paper  the  alternative  tvpes  of  information  that  can  he 
used  to  influence  learning  were  oaregorized  and  discussed  in  terms  of 
the  way  in  which  the  information  does  affect  the  learning  process.  Mv 
purpose  in  this  section  is  to  look  in  a  verv  simple  wav  at  adding  one 
further  com-  ..  '  * „  "his  analysis:  that  further  complication  is 

introduced  bv  the  possibility  of  buying  information.  When  information 
is  not  free  the  class  of  decisions  that  the  decision  maker  is  confronted 
with  is  vastly  increased  as  he  must  decide  on  how  much  and  what  tvpe  of 
information  to  purchase.  The  experiment  to  be  described  is  a  natural 
follow-on  to  one  reported  in  a  studv  bv  Keller,  Cole,  Rurke ,  and  F.stes 
[9].  It  is  thus  worth  briefly  recalling  their  procedure. 

The  Keller,  Cole,  Burke,  and  Estes  paper  analyzes  information 
structures  that  are  much  richer  than  that  of  ordinarv  paired-associate 


learning,  though  t)te  type  or  information  structure  that  they  analyze 
is  quite  different  from  those  analyzed  in  Jamison,  Lhamon ,  and  Suppes. 

In  Keller,  et  al.,  there  were  two  groups  of  subjects,  each  of  which 
was  faced  with  a  paired-associate  list  of  25  items.  There  were  two 
possible  responses  to  each  item,  and  to  eacn  response  there  was  as¬ 
signed  a  point  value  that  had  a  numerical  value  between  1  and  8.  At 
the  outset  of  the  exoeriment  the  subjects  did  not  know  the  point  value 
of  any  of  the  responses;  their  pay  at  the  end  of  the  experiment  was 
directly  proportional  to  the  total  number  of  points  that  they  accumu¬ 
lated  during  the  experimental  session.  Thev  accumulated  points  for 
each  response  thev  made;  that  is,  thev  received  on  each  response  the 
point  value  of  that  response.  The  two  experimental  conditions  were 
these.  In  the  first,  after  the  subject  responded  he  was  told  that  the 
point  value  of  both  the  response  he  had  made  and  the  alternative  re¬ 
sponse.  that  is,  he  was  given  complete  information  about  what  the  op¬ 
timal  response  was.  In  the  second  experimental  condition  the  subject 
was  given  the  point  value  onlv  of  the  response  that  he  had  made.  Thus, 
unless  he  reeeived  an  8  or  1,  the  mnvi.num  or  minimum  possible,  he  was 
ur.c  rtain  a*  o  wne.  .or  the  response  he  had  selected  would  be,  in  fact, 
opt  1  uid .1 .  The  prlmarv  purpose  of  Keller ;  et.  al.  ,  was  to  examine  how  both 
the  information  value  of  the  reinforcement  and  its  reward  value  affect 
the  subject's  performance.  In  this  studv  I  focus  on  a  single  aspect  of 
their  results,  that  Is,  that  of  how  the  subject  decides  about  whether 
or  not  ti'  acquire  Information  concerning  another  response  when  I."  al¬ 
ready  Isas  a  high  reward  value,  say  h,  as  a  result  of  hi  first  response, 
ibis  is  an  issue  that  arises  clearly  in  their  data.  It  turns  out  that 
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wi  en  the  two  reward  values  associated  with  the  two  response  alterna¬ 
tives  are,  say,  6  and  8,  then  occasionally  if  the  subject  first  responds 
with  the  alternative  having  the  value  of  6,  he  may  never  learn  that  the 
correct  response  is  8.  It  is  implied  that  this  is  a  failure  of  the 
subject  to  properly  learn  the  material  at  hand;  an  alternative  inter¬ 
pretation,  to  be  developed  below,  is  simply  that  the  cost  of  switching 
to  look  at  the  other  value  is  simply  too  high  for  the  subject  in  terms 
of  its  expected  value.  In  order  to  isolate  how  subjects  behave  when 
faced  with  choices  about  buying  information  the  experiment  described 
below  attempts  to  provide  a  task  in  which  the  learning  problem  is  so 
simple  that  it  need  not  be  analyzed.  In  effect,  it  is  a  rerun  of  the 
Keller,  et  al. ,  experiment  with  a  single  stimulus  item  instead  of  a 
list  of  25  items. 

The  problem  to  be  investigated  concerns,  then,  how  the  value  of 
the  response  alternative  that  the  subject  knows  affects  his  decision 
concerning  whether  or  not  to  look  at  the  other  alternative  and  how  the 
expected  number  of  remaining  trials  affects  that  decision.  This  last 
was  not  a  variable  explicitly  considered  in  Keller,  et  al. ;  it  is  a 
variable  explicitly  given  to  the  subject  in  the  experiment  described 
below.  Before  describing  the  method  of  the  experiment,  a  brief  theo¬ 
retical  development  will  be  required. 

II.  THEORETICAL  DEVELOPMENT 


The  basis  of  the  theoretical  model  to  be  described  below  is  the 
assumption  that  the  subject  is  trying  to  maximize  his  expected  total 
point  value  against  an  "objective"  probability  distribution  that  he 
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nnows .  (This  could  be  generalized  to  allow  for  a  utility  function 
nonlinear  in  points  and  a  subjective  probability  distribution.)  The 
subject  is  shown  a  card  with  a  point  value,  R,  between  50  and  100  on 
the  front  and  a  number,  N,  that  indicates  the  number  of  trials  remain¬ 
ing.  He  can  then  elect  to  do  one  of  two  things — stay  or  switch.  If 
he  stays,  the  number  of  points  he  receives  is  R  for  each  trial,  i.e., 
a  total  of  NF ,  If  he  switches  he  will  receive  on  the  first  trial  a 
point  value  randomly  chosen  between  0  and  100;  this  point  value  is 
written  on  the  back  of  the  card  and,  for  the  remaining  N-l  trials,  he 

receives  the  larger  of  the  point  values  written  on  the  front  and  back 

of  the  card.  The  analogy  between  this  and  the  Keller,  et  al.,  experi¬ 
ment  is  obvious.  Since  his  expected  value  for  the  first  trial  is  50 
points  (assuming  a  symmetrical  distribution)  by  switching  he  gives  up 
the  difference  between  50  and  what  he  knows  for  certain  he  can  obtain 
from  the  value  on  the  front  of  the  card.  He  does  so  in  the  hope  that 
the  number  on  the  back  of  the  card  is  sufficiently  greater  than  the 
number  on  the  front  so  that  the  expected  loss  can  be  made  up  in  the 
remaining  N-l  trials. 

Under  what  circumstances  should  the  subject  switch,  assuming  max¬ 
imization  of  expected  point  value?  Let  be  the  expected  point  value 
of  switching  and  be  the  expected  value  of  staving.  G  =  is 

the  expected  gain  from  switching;  the  subject  should  switch  if  G  _>  0. 

As  previously  noted,  =  NR.  will  depend  on  the  distribution  of  * 
the  point  value  on  the  back  of  the  card.  In  the  experiment  we  used  a 
uniform  distribution  and  1  will  make  that  assumption  here;  generaliza¬ 
tion  to  an  arbitrary  distribution  is  straightforward.  Let  p  be  the 
probability  of  improving  if  you  switch  and  R  be  the  expected  point 
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va.'tue  of  the  ir.h  of  the  card  given  that  it  is  an  improvement  over  the 
front.  luen,  and  here  the  assumption  of  uniform  distribution  is  used: 

p  =  1  -  R/iOO  and  R  =  50  +  R/2. 


We  can  now  express  V  in  t'!  terms, 

V.  =  50  1  (1  -  p)  (a  -  i)R  +  p(,\’  -  i)R  . 

i 

The  50  is  the  expected  value  of  the  first  trial;  with  probability  J  -  p 
the  subject  doesn't  improve  and  hence  receives  (N--l)R  more  points;  with 
probability  p  he  does  improve  and  receives  (N-l)R  more  points.  By 
substitution  it  is  now  possible  to  express  N  in  tert.s  only  of  R  and  G, 
the  expected  gain  from  switching: 


N-l 
"l  00 


2G  +  2R  -  1  Oh 
( 100-R) ~ 


/  a  \ 

<  i ) 


By  setting  G  =  0  we  obtain  a  relation  between  N  and  R  such  that  the 
subject  snould  be  indifferent  about  switching.  This  relationship  is 
graphed  in  Figure  1;  above  the  line  he  should  switch  and  below  it  he 
should  not. 


Insert  Figure  1  About  Here 


Tn  th  xperiment  to  be  described  we  selected  a  number  of  dis¬ 
crete  /allies  of  G  (rangi  ;  from  -15  to  10),  put  them  into  liquation  1, 
and  computed  a  number  of  N,R  pairs  consistent  with  that  value  of  G. 


The  hope  was  that  the  observed  probabilitv  oS 


switch  in”  we.uld  he  a 
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simple  monotonies! lv  increasing  function  of  G.  Though  our  experiment 
failed  to  bear  out  this  hope,  the  probability  of  switching  did  tend  to 
increase  with  increasing  G. 

It  is  perhaps  worth  making  one  final  comment  concerning  the  results 

Ci.’  Keller,  et  al.  From  the  rapid  rise  in  the  curve  on  Figure  1,  it  is 

perhaps  not  surprising  that  subjects  would  get  locked  into  a  6  response 
when  the  alternative  had  a  point  value  of  8,  This  situation  would  cor¬ 
respond  rougnly  to  a  point  value  of  75  in  the  schema  depicted  in  Figure 

1.  For  switching  to  be  optimal  in  these  circumstances  the  subject 
would  have  to  expect  at  least  9  more  trials  with  that  stimulus  item 
prior  to  the  end  of  the  experiment. 

III.  METHOD 

The  experiment  was  run  in  the  spring  of  1969  with  29  female  under¬ 
graduates  from  Boston  University  as  subjects.  They  participated  on  a 
voluntary  basis  and  were  given  no  pay  nor  were  they  satisfying  any 
course  requirements.  Each  subject  attended  one  experimental  session 
of  approximately  20-30  minutes  duration.  Each  subject  was  presented 
with  36  cards  and  shown  the  front  of  each  card.  On  the  front  were  two 
numbers,  one  designated  N  and  the  other  V.  The  subjects  were  told  N 
was  the  number  of  trials  remaining  and  that  V  was  the  point  value  of 
staying  with  the  number  on  the  front  of  the  card.  The  subjects  were 
told  they  could,  if  they  wished,  switch  and  see  the  number  on  the  back 
of  the  card.  If  that  number  were  higher  than  the  number  on  the  front, 
of  the  card,  they  would  receive  that  for  the  remaining  N  trials.  If, 
on  the  other  hand,  the  number  on  the  front  of  the  card  were  higher. 
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they  would  receive  the  number  on  the  back  of  the  card  for  the  first 
trial  and  the  number  on  the  ftont  for  the  remaining  N-l  trials.  They 
were  told  that  the  numoers  on  the  back  of  the  card  would  be  uniformly 
distributed  between  0  and  100.  The  meaning  of  "uniform  distribution" 
was  carefully  explained  in  intuitive  terms.  They  were  told  that  their 

objective  was  to  try  to  maximize  the  total  number  of  points  they  accu¬ 

mulated  over  the  36  cards.  It  was  further  explained  to  them  in  some 
detail  considerations  that  might  lead  the-n  to  wish  to  switch  or  not 
switch,  that  is,  a  high  point  value  was  explained  to  be  a  pressure  not 
to  switch  and  a  large  N  value  was  explained  to  be  a  pressure  to  switch. 
These  points  were  explained  until  the  subject  showed  an  understanding 
of  the  considerations  involved;  that  is,  the  subject  realized  that  by 
switching  they  were  sacrificing  some  points  in  the  short  term  in  order 
to  take  advantage  of  the  possibility  of  receiving  more  points  in  the 
longer  term. 

Table  1  shows  the  N  and  V  values  of  the  ?6  cards.  The  N  and  V 
values  were  chosen  to  cluster  around  each  of  a  number  of  different  G 
values  between  -15  and  +10.  The  G  values  represented  were  -15,  -10, 

-5,  -2,  0,  2,  5,  8,  and  10.  Each  C  value  was  represented  by  from 

three  to  five  cards.  All  subjects  were  shown  the  same  cards  and  each 

subject  responded  once  to  each  card. 

IV.  RESULTS 

Table  1  also  shows  the  results  for  the  experiment  on  a  card-by¬ 
card  basis.  The  final  column  of  Table  1  sh^vs  the  percentage  of  the 
subjects  who  switched  for  each  card.  These  results  are  shown  in  a 


-206- 


Table  1 


PERCENTAGE  OF  SUBJECTS  SWITCHING 


N 

V 

%  Switch 

N 

V 

%  Switch 

1 

65 

17 

1 

52 

52 

3 

70 

21 

3 

63 

52 

8 

75 

31 

4 

63 

52 

4 

75 

34 

3 

58 

52 

2 

70 

34.5 

3 

68 

55 

8 

80 

38 

1 

58 

55 

1 

60 

38 

2 

58 

55 

10 

80 

42 

2 

55 

55 

6 

74 

42 

3 

60 

55 

6 

75 

45 

1 

55 

58 

7 

75 

45 

2 

60 

58 

3 

65 

45 

9 

75 

58 

4 

70 

48 

7 

73 

62 

6 

70 

48 

8 

73 

62 

4 

65 

48 

1 

50 

65.3 

2 

52 

48 

2 

50 

65.5 

2 

64 

52 

5 

65 

65.5 

5 

70 

52 

7 

70 

69 

more  meaningful  form  in  Table  2.  There  the  percentage  that  switched 
averaged  across  cards  for  each  G  value  is  shown  listed  against  the 
various  G  values.  It  is  clear  from  Table  2  ttiat  the  probabilitv  of 
switching,  or  the  mean  switching  value,  is  not  related  in  a  very  clear 
and  systematic  way  to  the  G  value  as  would  be  predicted  from  a  theory 
based  on  maximization  of  expected  point  value.  Nevertheless,  it  is 
also  clear  that  the  expected  point  value  of  switching,  that  is,  the 
G  value,  does  influence  the  probability  of  switching;  for  those  G 
values  less  than  zero,  the  average  probability  of  switching  was  .41. 
For  those  G  values  above  zero,  the  average  probability  of  switching 
was  .56.  Nevertheless,  it  is  clear  that  there  is  considerable  erratic 
and,  as  yet,  unexplained  variation  within  numbers  given  in  Table  2. 
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Table  2 

PERCENTAGE  SWITCHING  RELATED  TO  GAIN 


Gain,  G 

%  Switching 

-15  (4)a 

31 

-10  (4) 

36 

-  8  (4) 

51 

~  5  (3) 

50 

~  2  (5) 

48 

0  (3) 

62 

2  (3) 

52 

5  (4) 

55 

8  (3) 

57 

10  (3) 

61 

The  number  in  parenthesis  is  the 
number  of  cards  having  N,V  values  that 
give  the  G  value  indicated.  Thus  the 
total  of  the  numbers  in  parenthesis  is 
36. 


The  primary  results  of  this  experiment  are  to  show  that  it  is 
possible  to  analyze  information-seeking  behavior  in  a  simple  micro- 
task,  chough  as  yet,  there  is  not  a  clear  theorv  to  explain  the  re¬ 
sults.  Nevertheless,  the  results  do  appear  to  be  at  least  influenced 
by  the  expected  point  value  of  the  information  to  be  obtained.  The 
problem  now  is  to  look  at  other  influences  that  might  be  affecting 
switching  behavior,  such  as:  curiosity,  undue  attention  to  the  r^'e- 
vant  point  value  of  the  alternative  given  at  present,  undue  attention 
to  the  number  of  remaining  trio  Is,  and  simple  random  components. 
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Section  Four 
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